Running databases in Kubernetes has traditionally been considered risky. But CloudNativePG (CNPG) changes that equation—it’s a mature, Kubernetes-native operator that makes PostgreSQL a first-class citizen in your cluster. Better yet, it seamlessly supports PostgreSQL extensions like PostGIS for geospatial data and pgvector for AI embeddings.
Why CloudNativePG?
CloudNativePG stands out among PostgreSQL operators:
- Kubernetes-native: Built from scratch for Kubernetes, not ported
- Declarative: Manage clusters through CRDs
- High availability: Automatic failover and self-healing
- Backup/restore: Built-in support for S3, Azure Blob, GCS
- Extensions support: Run PostGIS, pgvector, and any PostgreSQL extension
- CNCF project: Graduated incubation, production-ready
Installing CloudNativePG
Deploy the operator with Helm:
1
2
3
4
5
6
| helm repo add cnpg https://cloudnative-pg.github.io/charts
helm repo update
helm install cnpg cnpg/cloudnative-pg \
--namespace cnpg-system \
--create-namespace
|
Or use kubectl directly:
1
2
| kubectl apply --server-side -f \
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.25/releases/cnpg-1.25.0.yaml
|
Verify the installation:
1
2
| kubectl get pods -n cnpg-system
kubectl get crds | grep cnpg
|
Your First PostgreSQL Cluster
Create a basic 3-node PostgreSQL cluster:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgres-cluster
namespace: database
spec:
instances: 3
postgresql:
parameters:
max_connections: "200"
shared_buffers: "256MB"
effective_cache_size: "768MB"
work_mem: "8MB"
storage:
size: 10Gi
storageClass: standard # Use your storage class
bootstrap:
initdb:
database: app
owner: app
|
Apply and watch the cluster come up:
1
2
| kubectl apply -f postgres-cluster.yaml
kubectl get cluster -n database -w
|
Adding PostgreSQL Extensions
Custom Container Images
CNPG supports any PostgreSQL image. For extensions, use pre-built images or create your own:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgres-extensions
namespace: database
spec:
instances: 3
# Use an image with extensions pre-installed
imageName: ghcr.io/cloudnative-pg/postgis:16-3.4
postgresql:
shared_preload_libraries:
- pg_stat_statements
parameters:
max_connections: "200"
storage:
size: 20Gi
|
Building Custom Extension Images
Create a Dockerfile with your required extensions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| FROM ghcr.io/cloudnative-pg/postgresql:16
# Install build dependencies
USER root
RUN apt-get update && apt-get install -y \
build-essential \
postgresql-server-dev-16 \
git \
&& rm -rf /var/lib/apt/lists/*
# Install pgvector for AI embeddings
RUN cd /tmp && \
git clone --branch v0.7.4 https://github.com/pgvector/pgvector.git && \
cd pgvector && \
make && \
make install && \
cd / && rm -rf /tmp/pgvector
# Switch back to postgres user
USER 26
|
Build and push:
1
2
| docker build -t myregistry/postgres-ai:16 .
docker push myregistry/postgres-ai:16
|
PostGIS: Geospatial Data in Kubernetes
PostGIS transforms PostgreSQL into a powerful geospatial database. Deploy it with CNPG:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgis-cluster
namespace: geospatial
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgis:16-3.4
postgresql:
parameters:
max_connections: "100"
shared_buffers: "512MB"
work_mem: "64MB"
maintenance_work_mem: "256MB"
storage:
size: 50Gi
storageClass: fast-ssd
bootstrap:
initdb:
database: geodb
owner: geouser
postInitSQL:
- CREATE EXTENSION IF NOT EXISTS postgis;
- CREATE EXTENSION IF NOT EXISTS postgis_topology;
- CREATE EXTENSION IF NOT EXISTS postgis_raster;
|
Using PostGIS
Connect and work with geospatial data:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| -- Create a table with geometry
CREATE TABLE locations (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
location GEOMETRY(Point, 4326)
);
-- Insert some points (longitude, latitude)
INSERT INTO locations (name, location) VALUES
('Dubai Mall', ST_SetSRID(ST_MakePoint(55.2796, 25.1972), 4326)),
('Burj Khalifa', ST_SetSRID(ST_MakePoint(55.2744, 25.1972), 4326)),
('Abu Dhabi Corniche', ST_SetSRID(ST_MakePoint(54.3568, 24.4764), 4326));
-- Find locations within 10km of a point
SELECT name, ST_Distance(
location::geography,
ST_SetSRID(ST_MakePoint(55.2700, 25.2000), 4326)::geography
) / 1000 AS distance_km
FROM locations
WHERE ST_DWithin(
location::geography,
ST_SetSRID(ST_MakePoint(55.2700, 25.2000), 4326)::geography,
10000 -- 10km in meters
);
|
pgvector: AI Embeddings in PostgreSQL
pgvector enables vector similarity search—essential for AI applications like semantic search, RAG, and recommendation systems.
Deploying pgvector with CNPG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
| apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: vector-db
namespace: ai
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/pgvector:16-0.7.4
postgresql:
parameters:
max_connections: "200"
shared_buffers: "1GB"
effective_cache_size: "3GB"
work_mem: "256MB"
maintenance_work_mem: "512MB"
storage:
size: 100Gi
storageClass: fast-ssd
bootstrap:
initdb:
database: vectordb
owner: aiapp
postInitSQL:
- CREATE EXTENSION IF NOT EXISTS vector;
|
Working with Embeddings
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| -- Create a table for document embeddings
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT,
embedding vector(1536) -- OpenAI ada-002 dimensions
);
-- Create an index for fast similarity search
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Insert documents with embeddings (embeddings come from your AI model)
INSERT INTO documents (title, content, embedding) VALUES
('Cloud Architecture', 'Guide to building scalable cloud systems...',
'[0.1, 0.2, ...]'::vector);
-- Semantic search: find similar documents
SELECT title, content,
1 - (embedding <=> query_embedding) AS similarity
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 5;
|
Python Integration for RAG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
| import psycopg2
from openai import OpenAI
import numpy as np
# Initialize clients
openai_client = OpenAI()
conn = psycopg2.connect(
host="vector-db-rw.ai.svc.cluster.local",
database="vectordb",
user="aiapp",
password="your-password"
)
def get_embedding(text: str) -> list:
"""Generate embedding using OpenAI."""
response = openai_client.embeddings.create(
model="text-embedding-ada-002",
input=text
)
return response.data[0].embedding
def semantic_search(query: str, limit: int = 5):
"""Find similar documents using vector similarity."""
query_embedding = get_embedding(query)
with conn.cursor() as cur:
cur.execute("""
SELECT title, content,
1 - (embedding <=> %s::vector) AS similarity
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT %s
""", (query_embedding, query_embedding, limit))
return cur.fetchall()
# Find relevant documents for RAG
results = semantic_search("How to scale Kubernetes deployments?")
for title, content, similarity in results:
print(f"{title} (similarity: {similarity:.3f})")
|
Combining Extensions: Geospatial AI
Create powerful applications combining PostGIS and pgvector:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: geo-ai-db
namespace: production
spec:
instances: 3
imageName: myregistry/postgres-geoai:16 # Custom image with both
postgresql:
parameters:
max_connections: "200"
shared_buffers: "2GB"
storage:
size: 200Gi
bootstrap:
initdb:
database: geoai
owner: appuser
postInitSQL:
- CREATE EXTENSION IF NOT EXISTS postgis;
- CREATE EXTENSION IF NOT EXISTS vector;
|
Use case: Location-aware semantic search:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| -- Places with descriptions and embeddings
CREATE TABLE places (
id SERIAL PRIMARY KEY,
name TEXT,
description TEXT,
location GEOMETRY(Point, 4326),
description_embedding vector(1536)
);
-- Find semantically similar places within 5km
SELECT name, description,
ST_Distance(location::geography,
ST_SetSRID(ST_MakePoint(55.27, 25.20), 4326)::geography) / 1000 AS distance_km,
1 - (description_embedding <=> query_embedding) AS semantic_similarity
FROM places
WHERE ST_DWithin(
location::geography,
ST_SetSRID(ST_MakePoint(55.27, 25.20), 4326)::geography,
5000
)
ORDER BY description_embedding <=> query_embedding
LIMIT 10;
|
Other Useful Extensions
pg_stat_statements: Query Analytics
1
2
3
4
5
6
| postgresql:
shared_preload_libraries:
- pg_stat_statements
parameters:
pg_stat_statements.track: all
pg_stat_statements.max: "10000"
|
1
2
3
4
5
6
7
| CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find slow queries
SELECT query, calls, total_exec_time / calls AS avg_time_ms
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
|
TimescaleDB: Time-Series Data
1
2
3
4
5
6
7
8
9
10
11
| apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: timeseries-db
spec:
instances: 3
imageName: timescale/timescaledb-ha:pg16
postgresql:
shared_preload_libraries:
- timescaledb
|
pg_cron: Scheduled Jobs
1
2
3
4
5
6
7
8
9
| CREATE EXTENSION IF NOT EXISTS pg_cron;
-- Clean up old data every night
SELECT cron.schedule('nightly-cleanup', '0 3 * * *',
$$DELETE FROM logs WHERE created_at < NOW() - INTERVAL '30 days'$$);
-- Refresh materialized view hourly
SELECT cron.schedule('refresh-stats', '0 * * * *',
$$REFRESH MATERIALIZED VIEW CONCURRENTLY hourly_stats$$);
|
Production Configuration
Backup to S3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
| apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: production-db
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/pgvector:16-0.7.4
storage:
size: 100Gi
backup:
barmanObjectStore:
destinationPath: s3://my-backups/postgres/
endpointURL: https://s3.amazonaws.com
s3Credentials:
accessKeyId:
name: s3-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: s3-creds
key: SECRET_ACCESS_KEY
wal:
compression: gzip
data:
compression: gzip
retentionPolicy: "30d"
# Schedule backups
backup:
barmanObjectStore:
# ... storage config above
retentionPolicy: "30d"
---
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
name: daily-backup
spec:
schedule: "0 2 * * *"
cluster:
name: production-db
backupOwnerReference: self
|
Monitoring with Prometheus
CNPG exports metrics automatically:
1
2
3
4
5
6
7
8
9
10
11
12
| apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: monitored-db
spec:
instances: 3
monitoring:
enablePodMonitor: true
customQueriesConfigMap:
- name: custom-queries
key: queries.yaml
|
Create a ServiceMonitor for Prometheus:
1
2
3
4
5
6
7
8
9
10
11
12
| apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: cnpg-monitor
labels:
app: cnpg
spec:
selector:
matchLabels:
cnpg.io/cluster: production-db
podMetricsEndpoints:
- port: metrics
|
Resource Management
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| spec:
instances: 3
resources:
requests:
memory: "4Gi"
cpu: "2"
limits:
memory: "8Gi"
cpu: "4"
# Dedicated node pools
affinity:
nodeSelector:
workload: database
tolerations:
- key: "database"
operator: "Equal"
value: "postgres"
effect: "NoSchedule"
|
Connecting Applications
CNPG creates services for read/write and read-only connections:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| # Application deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: app
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: production-db-app # Auto-generated secret
key: uri
# Or use individual values
- name: DB_HOST
value: production-db-rw.database.svc.cluster.local # Read-write
- name: DB_HOST_RO
value: production-db-ro.database.svc.cluster.local # Read replicas
|
Conclusion
CloudNativePG makes running PostgreSQL in Kubernetes production-ready:
- Declarative management through Kubernetes CRDs
- High availability with automatic failover
- Extension support for PostGIS, pgvector, TimescaleDB, and more
- Integrated backups to object storage
- Built-in monitoring for observability
Whether you’re building geospatial applications with PostGIS, AI-powered search with pgvector, or combining both for location-aware intelligence, CNPG provides the foundation you need.
At Sajima Solutions, we help organizations deploy and operate cloud-native databases at scale. Contact us to bring production-ready PostgreSQL to your Kubernetes infrastructure.