CloudNativePG: Running PostgreSQL with Extensions in Kubernetes | Sajima Solutions

Running databases in Kubernetes has traditionally been considered risky. But CloudNativePG (CNPG) changes that equation—it’s a mature, Kubernetes-native operator that makes PostgreSQL a first-class citizen in your cluster. Better yet, it seamlessly supports PostgreSQL extensions like PostGIS for geospatial data and pgvector for AI embeddings.

Why CloudNativePG?

CloudNativePG stands out among PostgreSQL operators:

Kubernetes-native: Built from scratch for Kubernetes, not ported
Declarative: Manage clusters through CRDs
High availability: Automatic failover and self-healing
Backup/restore: Built-in support for S3, Azure Blob, GCS
Extensions support: Run PostGIS, pgvector, and any PostgreSQL extension
CNCF project: Graduated incubation, production-ready

Installing CloudNativePG

Deploy the operator with Helm:

1
2
3
4
5
6
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm repo update

helm install cnpg cnpg/cloudnative-pg \
  --namespace cnpg-system \
  --create-namespace

Or use kubectl directly:

1
2
kubectl apply --server-side -f \
  https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.25/releases/cnpg-1.25.0.yaml

Verify the installation:

1
2
kubectl get pods -n cnpg-system
kubectl get crds | grep cnpg

Your First PostgreSQL Cluster

Create a basic 3-node PostgreSQL cluster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgres-cluster
  namespace: database
spec:
  instances: 3
  
  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "256MB"
      effective_cache_size: "768MB"
      work_mem: "8MB"
  
  storage:
    size: 10Gi
    storageClass: standard  # Use your storage class
  
  bootstrap:
    initdb:
      database: app
      owner: app

Apply and watch the cluster come up:

1
2
kubectl apply -f postgres-cluster.yaml
kubectl get cluster -n database -w

Adding PostgreSQL Extensions

Custom Container Images

CNPG supports any PostgreSQL image. For extensions, use pre-built images or create your own:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgres-extensions
  namespace: database
spec:
  instances: 3
  
  # Use an image with extensions pre-installed
  imageName: ghcr.io/cloudnative-pg/postgis:16-3.4
  
  postgresql:
    shared_preload_libraries:
      - pg_stat_statements
    parameters:
      max_connections: "200"
  
  storage:
    size: 20Gi

Building Custom Extension Images

Create a Dockerfile with your required extensions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
FROM ghcr.io/cloudnative-pg/postgresql:16

# Install build dependencies
USER root
RUN apt-get update && apt-get install -y \
    build-essential \
    postgresql-server-dev-16 \
    git \
    && rm -rf /var/lib/apt/lists/*

# Install pgvector for AI embeddings
RUN cd /tmp && \
    git clone --branch v0.7.4 https://github.com/pgvector/pgvector.git && \
    cd pgvector && \
    make && \
    make install && \
    cd / && rm -rf /tmp/pgvector

# Switch back to postgres user
USER 26

Build and push:

1
2
docker build -t myregistry/postgres-ai:16 .
docker push myregistry/postgres-ai:16

PostGIS: Geospatial Data in Kubernetes

PostGIS transforms PostgreSQL into a powerful geospatial database. Deploy it with CNPG:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgis-cluster
  namespace: geospatial
spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/postgis:16-3.4
  
  postgresql:
    parameters:
      max_connections: "100"
      shared_buffers: "512MB"
      work_mem: "64MB"
      maintenance_work_mem: "256MB"
  
  storage:
    size: 50Gi
    storageClass: fast-ssd
  
  bootstrap:
    initdb:
      database: geodb
      owner: geouser
      postInitSQL:
        - CREATE EXTENSION IF NOT EXISTS postgis;
        - CREATE EXTENSION IF NOT EXISTS postgis_topology;
        - CREATE EXTENSION IF NOT EXISTS postgis_raster;

Using PostGIS

Connect and work with geospatial data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Create a table with geometry
CREATE TABLE locations (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    location GEOMETRY(Point, 4326)
);

-- Insert some points (longitude, latitude)
INSERT INTO locations (name, location) VALUES
    ('Dubai Mall', ST_SetSRID(ST_MakePoint(55.2796, 25.1972), 4326)),
    ('Burj Khalifa', ST_SetSRID(ST_MakePoint(55.2744, 25.1972), 4326)),
    ('Abu Dhabi Corniche', ST_SetSRID(ST_MakePoint(54.3568, 24.4764), 4326));

-- Find locations within 10km of a point
SELECT name, ST_Distance(
    location::geography,
    ST_SetSRID(ST_MakePoint(55.2700, 25.2000), 4326)::geography
) / 1000 AS distance_km
FROM locations
WHERE ST_DWithin(
    location::geography,
    ST_SetSRID(ST_MakePoint(55.2700, 25.2000), 4326)::geography,
    10000  -- 10km in meters
);

pgvector: AI Embeddings in PostgreSQL

pgvector enables vector similarity search—essential for AI applications like semantic search, RAG, and recommendation systems.

Deploying pgvector with CNPG

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: vector-db
  namespace: ai
spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/pgvector:16-0.7.4
  
  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "1GB"
      effective_cache_size: "3GB"
      work_mem: "256MB"
      maintenance_work_mem: "512MB"
  
  storage:
    size: 100Gi
    storageClass: fast-ssd
  
  bootstrap:
    initdb:
      database: vectordb
      owner: aiapp
      postInitSQL:
        - CREATE EXTENSION IF NOT EXISTS vector;

Working with Embeddings

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Create a table for document embeddings
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT,
    embedding vector(1536)  -- OpenAI ada-002 dimensions
);

-- Create an index for fast similarity search
CREATE INDEX ON documents 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Insert documents with embeddings (embeddings come from your AI model)
INSERT INTO documents (title, content, embedding) VALUES
    ('Cloud Architecture', 'Guide to building scalable cloud systems...', 
     '[0.1, 0.2, ...]'::vector);

-- Semantic search: find similar documents
SELECT title, content, 
       1 - (embedding <=> query_embedding) AS similarity
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 5;

Python Integration for RAG

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import psycopg2
from openai import OpenAI
import numpy as np

# Initialize clients
openai_client = OpenAI()
conn = psycopg2.connect(
    host="vector-db-rw.ai.svc.cluster.local",
    database="vectordb",
    user="aiapp",
    password="your-password"
)

def get_embedding(text: str) -> list:
    """Generate embedding using OpenAI."""
    response = openai_client.embeddings.create(
        model="text-embedding-ada-002",
        input=text
    )
    return response.data[0].embedding

def semantic_search(query: str, limit: int = 5):
    """Find similar documents using vector similarity."""
    query_embedding = get_embedding(query)
    
    with conn.cursor() as cur:
        cur.execute("""
            SELECT title, content, 
                   1 - (embedding <=> %s::vector) AS similarity
            FROM documents
            ORDER BY embedding <=> %s::vector
            LIMIT %s
        """, (query_embedding, query_embedding, limit))
        
        return cur.fetchall()

# Find relevant documents for RAG
results = semantic_search("How to scale Kubernetes deployments?")
for title, content, similarity in results:
    print(f"{title} (similarity: {similarity:.3f})")

Combining Extensions: Geospatial AI

Create powerful applications combining PostGIS and pgvector:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: geo-ai-db
  namespace: production
spec:
  instances: 3
  imageName: myregistry/postgres-geoai:16  # Custom image with both
  
  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "2GB"
  
  storage:
    size: 200Gi
  
  bootstrap:
    initdb:
      database: geoai
      owner: appuser
      postInitSQL:
        - CREATE EXTENSION IF NOT EXISTS postgis;
        - CREATE EXTENSION IF NOT EXISTS vector;

Use case: Location-aware semantic search:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Places with descriptions and embeddings
CREATE TABLE places (
    id SERIAL PRIMARY KEY,
    name TEXT,
    description TEXT,
    location GEOMETRY(Point, 4326),
    description_embedding vector(1536)
);

-- Find semantically similar places within 5km
SELECT name, description,
       ST_Distance(location::geography, 
                   ST_SetSRID(ST_MakePoint(55.27, 25.20), 4326)::geography) / 1000 AS distance_km,
       1 - (description_embedding <=> query_embedding) AS semantic_similarity
FROM places
WHERE ST_DWithin(
    location::geography,
    ST_SetSRID(ST_MakePoint(55.27, 25.20), 4326)::geography,
    5000
)
ORDER BY description_embedding <=> query_embedding
LIMIT 10;

Other Useful Extensions

pg_stat_statements: Query Analytics

1
2
3
4
5
6
postgresql:
  shared_preload_libraries:
    - pg_stat_statements
  parameters:
    pg_stat_statements.track: all
    pg_stat_statements.max: "10000"

1
2
3
4
5
6
7
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find slow queries
SELECT query, calls, total_exec_time / calls AS avg_time_ms
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

TimescaleDB: Time-Series Data

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: timeseries-db
spec:
  instances: 3
  imageName: timescale/timescaledb-ha:pg16
  
  postgresql:
    shared_preload_libraries:
      - timescaledb

pg_cron: Scheduled Jobs

1
2
3
4
5
6
7
8
9
CREATE EXTENSION IF NOT EXISTS pg_cron;

-- Clean up old data every night
SELECT cron.schedule('nightly-cleanup', '0 3 * * *', 
    $$DELETE FROM logs WHERE created_at < NOW() - INTERVAL '30 days'$$);

-- Refresh materialized view hourly
SELECT cron.schedule('refresh-stats', '0 * * * *',
    $$REFRESH MATERIALIZED VIEW CONCURRENTLY hourly_stats$$);

Production Configuration

Backup to S3

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: production-db
spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/pgvector:16-0.7.4
  
  storage:
    size: 100Gi
  
  backup:
    barmanObjectStore:
      destinationPath: s3://my-backups/postgres/
      endpointURL: https://s3.amazonaws.com
      s3Credentials:
        accessKeyId:
          name: s3-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: s3-creds
          key: SECRET_ACCESS_KEY
      wal:
        compression: gzip
      data:
        compression: gzip
    retentionPolicy: "30d"
  
  # Schedule backups
  backup:
    barmanObjectStore:
      # ... storage config above
    retentionPolicy: "30d"
---
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: daily-backup
spec:
  schedule: "0 2 * * *"
  cluster:
    name: production-db
  backupOwnerReference: self

Monitoring with Prometheus

CNPG exports metrics automatically:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: monitored-db
spec:
  instances: 3
  
  monitoring:
    enablePodMonitor: true
    customQueriesConfigMap:
      - name: custom-queries
        key: queries.yaml

Create a ServiceMonitor for Prometheus:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: cnpg-monitor
  labels:
    app: cnpg
spec:
  selector:
    matchLabels:
      cnpg.io/cluster: production-db
  podMetricsEndpoints:
  - port: metrics

Resource Management

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
spec:
  instances: 3
  
  resources:
    requests:
      memory: "4Gi"
      cpu: "2"
    limits:
      memory: "8Gi"
      cpu: "4"
  
  # Dedicated node pools
  affinity:
    nodeSelector:
      workload: database
    tolerations:
    - key: "database"
      operator: "Equal"
      value: "postgres"
      effect: "NoSchedule"

Connecting Applications

CNPG creates services for read/write and read-only connections:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# Application deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
      - name: app
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: production-db-app  # Auto-generated secret
              key: uri
        # Or use individual values
        - name: DB_HOST
          value: production-db-rw.database.svc.cluster.local  # Read-write
        - name: DB_HOST_RO
          value: production-db-ro.database.svc.cluster.local  # Read replicas

Conclusion

CloudNativePG makes running PostgreSQL in Kubernetes production-ready:

Declarative management through Kubernetes CRDs
High availability with automatic failover
Extension support for PostGIS, pgvector, TimescaleDB, and more
Integrated backups to object storage
Built-in monitoring for observability

Whether you’re building geospatial applications with PostGIS, AI-powered search with pgvector, or combining both for location-aware intelligence, CNPG provides the foundation you need.

At Sajima Solutions, we help organizations deploy and operate cloud-native databases at scale. Contact us to bring production-ready PostgreSQL to your Kubernetes infrastructure.