Kubernetes is powerful, but managing stateful applications and complex deployments still requires human intervention. Operators solve this by encoding operational knowledge into software—they’re like having a site reliability engineer watching your cluster 24/7. And Go is the perfect language to build them.

What is a Kubernetes Operator?

An operator is a custom controller that extends Kubernetes’ capabilities by managing the complete lifecycle of an application. Instead of manually performing tasks like:

  • Deploying database replicas
  • Handling failovers
  • Running backups
  • Scaling based on custom metrics

An operator automates all of this by watching for changes and reconciling the desired state with the actual state of your cluster.

Why Build Operators with Go?

1. Native Kubernetes Integration

Kubernetes itself is written in Go, and all its client libraries are Go-first. The client-go library provides full access to the Kubernetes API with type safety and code generation.

2. Controller-Runtime Framework

The controller-runtime library, part of the Kubebuilder project, provides high-level abstractions that eliminate boilerplate:

1
2
3
4
5
// A complete controller in just a few lines
func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // Your reconciliation logic here
    return ctrl.Result{}, nil
}

3. Excellent Concurrency Model

Operators need to watch multiple resources and handle events concurrently. Go’s goroutines and channels make this natural and efficient.

4. Static Binaries

Go compiles to a single binary with no dependencies, making container images small and startup fast—perfect for running in Kubernetes.

5. Strong Ecosystem

Prometheus metrics, structured logging, gRPC support—everything you need for production operators is available in Go.

The Operator Pattern

Every operator follows the same reconciliation loop:

123456......WRRCTUaeeoaptaamkdcddpeahatttraefhhecoeetsraitdcnoaeeudntvsrueirctsnreooRtenmaesdtprncuedo(sstcnCtteorcraaneiettdcqlaeeiiuitfleae((feut,Cceeiulr(oUsuecinptsnrfdotceLameeanotrteoeReep,er/dseueDospdeuodlruaecrtteceee/)ssdp)eelce)te)

Building Your First Operator: A Simple Website Manager

Let’s build a practical operator that manages static websites. When you create a Website custom resource, the operator will:

  1. Create a Deployment with nginx
  2. Create a ConfigMap with the website content
  3. Create a Service to expose it
  4. Update status with the website URL

Step 1: Set Up the Project

First, install Kubebuilder and create a new project:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Install kubebuilder
curl -L -o kubebuilder https://go.kubebuilder.io/dl/latest/$(go env GOOS)/$(go env GOARCH)
chmod +x kubebuilder && sudo mv kubebuilder /usr/local/bin/

# Create the project
mkdir website-operator && cd website-operator
kubebuilder init --domain example.com --repo github.com/example/website-operator

# Create the API (Custom Resource + Controller)
kubebuilder create api --group webapp --version v1 --kind Website

Step 2: Define the Custom Resource

Edit api/v1/website_types.go:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
package v1

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// WebsiteSpec defines the desired state of Website
type WebsiteSpec struct {
    // Title is the website title displayed in the browser
    Title string `json:"title"`
    
    // Content is the HTML content of the website
    Content string `json:"content"`
    
    // Replicas is the number of nginx pods to run
    // +kubebuilder:default=1
    // +kubebuilder:validation:Minimum=1
    // +kubebuilder:validation:Maximum=10
    Replicas int32 `json:"replicas,omitempty"`
}

// WebsiteStatus defines the observed state of Website
type WebsiteStatus struct {
    // URL is the endpoint where the website is accessible
    URL string `json:"url,omitempty"`
    
    // Ready indicates if the website is serving traffic
    Ready bool `json:"ready"`
    
    // AvailableReplicas is the number of ready pods
    AvailableReplicas int32 `json:"availableReplicas"`
}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:name="URL",type=string,JSONPath=`.status.url`
// +kubebuilder:printcolumn:name="Ready",type=boolean,JSONPath=`.status.ready`
// +kubebuilder:printcolumn:name="Replicas",type=integer,JSONPath=`.status.availableReplicas`

// Website is the Schema for the websites API
type Website struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   WebsiteSpec   `json:"spec,omitempty"`
    Status WebsiteStatus `json:"status,omitempty"`
}

// +kubebuilder:object:root=true

// WebsiteList contains a list of Website
type WebsiteList struct {
    metav1.TypeMeta `json:",inline"`
    metav1.ListMeta `json:"metadata,omitempty"`
    Items           []Website `json:"items"`
}

func init() {
    SchemeBuilder.Register(&Website{}, &WebsiteList{})
}

Step 3: Implement the Controller

Edit internal/controller/website_controller.go:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
package controller

import (
    "context"
    "fmt"

    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/api/errors"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/apimachinery/pkg/types"
    "k8s.io/apimachinery/pkg/util/intstr"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/log"

    webappv1 "github.com/example/website-operator/api/v1"
)

// WebsiteReconciler reconciles a Website object
type WebsiteReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

// +kubebuilder:rbac:groups=webapp.example.com,resources=websites,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=webapp.example.com,resources=websites/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=configmaps,verbs=get;list;watch;create;update;patch;delete

func (r *WebsiteReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    logger := log.FromContext(ctx)

    // Step 1: Fetch the Website custom resource
    var website webappv1.Website
    if err := r.Get(ctx, req.NamespacedName, &website); err != nil {
        if errors.IsNotFound(err) {
            // Website was deleted, nothing to do
            return ctrl.Result{}, nil
        }
        return ctrl.Result{}, err
    }

    logger.Info("Reconciling Website", "name", website.Name)

    // Step 2: Create or update the ConfigMap with website content
    if err := r.reconcileConfigMap(ctx, &website); err != nil {
        return ctrl.Result{}, err
    }

    // Step 3: Create or update the Deployment
    if err := r.reconcileDeployment(ctx, &website); err != nil {
        return ctrl.Result{}, err
    }

    // Step 4: Create or update the Service
    if err := r.reconcileService(ctx, &website); err != nil {
        return ctrl.Result{}, err
    }

    // Step 5: Update the Website status
    if err := r.updateStatus(ctx, &website); err != nil {
        return ctrl.Result{}, err
    }

    return ctrl.Result{}, nil
}

func (r *WebsiteReconciler) reconcileConfigMap(ctx context.Context, website *webappv1.Website) error {
    configMap := &corev1.ConfigMap{
        ObjectMeta: metav1.ObjectMeta{
            Name:      website.Name + "-content",
            Namespace: website.Namespace,
        },
        Data: map[string]string{
            "index.html": fmt.Sprintf(`<!DOCTYPE html>
<html>
<head>
    <title>%s</title>
    <style>
        body { font-family: system-ui, sans-serif; max-width: 800px; margin: 50px auto; padding: 20px; }
        h1 { color: #2563eb; }
    </style>
</head>
<body>
    <h1>%s</h1>
    <div>%s</div>
    <footer><p>Served by Website Operator</p></footer>
</body>
</html>`, website.Spec.Title, website.Spec.Title, website.Spec.Content),
        },
    }

    // Set owner reference for garbage collection
    if err := ctrl.SetControllerReference(website, configMap, r.Scheme); err != nil {
        return err
    }

    // Create or update
    existing := &corev1.ConfigMap{}
    err := r.Get(ctx, types.NamespacedName{Name: configMap.Name, Namespace: configMap.Namespace}, existing)
    if errors.IsNotFound(err) {
        return r.Create(ctx, configMap)
    } else if err != nil {
        return err
    }

    existing.Data = configMap.Data
    return r.Update(ctx, existing)
}

func (r *WebsiteReconciler) reconcileDeployment(ctx context.Context, website *webappv1.Website) error {
    replicas := website.Spec.Replicas
    if replicas == 0 {
        replicas = 1
    }

    deployment := &appsv1.Deployment{
        ObjectMeta: metav1.ObjectMeta{
            Name:      website.Name,
            Namespace: website.Namespace,
        },
        Spec: appsv1.DeploymentSpec{
            Replicas: &replicas,
            Selector: &metav1.LabelSelector{
                MatchLabels: map[string]string{
                    "app":     "website",
                    "website": website.Name,
                },
            },
            Template: corev1.PodTemplateSpec{
                ObjectMeta: metav1.ObjectMeta{
                    Labels: map[string]string{
                        "app":     "website",
                        "website": website.Name,
                    },
                },
                Spec: corev1.PodSpec{
                    Containers: []corev1.Container{
                        {
                            Name:  "nginx",
                            Image: "nginx:alpine",
                            Ports: []corev1.ContainerPort{
                                {ContainerPort: 80},
                            },
                            VolumeMounts: []corev1.VolumeMount{
                                {
                                    Name:      "content",
                                    MountPath: "/usr/share/nginx/html",
                                },
                            },
                            Resources: corev1.ResourceRequirements{
                                Requests: corev1.ResourceList{
                                    corev1.ResourceCPU:    resource.MustParse("10m"),
                                    corev1.ResourceMemory: resource.MustParse("16Mi"),
                                },
                                Limits: corev1.ResourceList{
                                    corev1.ResourceCPU:    resource.MustParse("100m"),
                                    corev1.ResourceMemory: resource.MustParse("64Mi"),
                                },
                            },
                        },
                    },
                    Volumes: []corev1.Volume{
                        {
                            Name: "content",
                            VolumeSource: corev1.VolumeSource{
                                ConfigMap: &corev1.ConfigMapVolumeSource{
                                    LocalObjectReference: corev1.LocalObjectReference{
                                        Name: website.Name + "-content",
                                    },
                                },
                            },
                        },
                    },
                },
            },
        },
    }

    if err := ctrl.SetControllerReference(website, deployment, r.Scheme); err != nil {
        return err
    }

    existing := &appsv1.Deployment{}
    err := r.Get(ctx, types.NamespacedName{Name: deployment.Name, Namespace: deployment.Namespace}, existing)
    if errors.IsNotFound(err) {
        return r.Create(ctx, deployment)
    } else if err != nil {
        return err
    }

    existing.Spec = deployment.Spec
    return r.Update(ctx, existing)
}

func (r *WebsiteReconciler) reconcileService(ctx context.Context, website *webappv1.Website) error {
    service := &corev1.Service{
        ObjectMeta: metav1.ObjectMeta{
            Name:      website.Name,
            Namespace: website.Namespace,
        },
        Spec: corev1.ServiceSpec{
            Selector: map[string]string{
                "app":     "website",
                "website": website.Name,
            },
            Ports: []corev1.ServicePort{
                {
                    Port:       80,
                    TargetPort: intstr.FromInt(80),
                },
            },
            Type: corev1.ServiceTypeClusterIP,
        },
    }

    if err := ctrl.SetControllerReference(website, service, r.Scheme); err != nil {
        return err
    }

    existing := &corev1.Service{}
    err := r.Get(ctx, types.NamespacedName{Name: service.Name, Namespace: service.Namespace}, existing)
    if errors.IsNotFound(err) {
        return r.Create(ctx, service)
    } else if err != nil {
        return err
    }

    // Services are immutable in some fields, so we only update what we can
    existing.Spec.Selector = service.Spec.Selector
    existing.Spec.Ports = service.Spec.Ports
    return r.Update(ctx, existing)
}

func (r *WebsiteReconciler) updateStatus(ctx context.Context, website *webappv1.Website) error {
    // Get the deployment to check replica status
    deployment := &appsv1.Deployment{}
    err := r.Get(ctx, types.NamespacedName{Name: website.Name, Namespace: website.Namespace}, deployment)
    if err != nil {
        return err
    }

    // Update status
    website.Status.AvailableReplicas = deployment.Status.AvailableReplicas
    website.Status.Ready = deployment.Status.AvailableReplicas > 0
    website.Status.URL = fmt.Sprintf("http://%s.%s.svc.cluster.local", website.Name, website.Namespace)

    return r.Status().Update(ctx, website)
}

// SetupWithManager sets up the controller with the Manager
func (r *WebsiteReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&webappv1.Website{}).
        Owns(&appsv1.Deployment{}).
        Owns(&corev1.ConfigMap{}).
        Owns(&corev1.Service{}).
        Complete(r)
}

Don’t forget to add the import for resource:

1
2
3
4
import (
    // ... other imports
    "k8s.io/apimachinery/pkg/api/resource"
)

Step 4: Deploy and Test

Generate manifests and install CRDs:

1
2
3
4
5
6
7
8
# Generate CRD manifests
make manifests

# Install CRDs in your cluster
make install

# Run the operator locally for development
make run

Create a sample website:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# config/samples/webapp_v1_website.yaml
apiVersion: webapp.example.com/v1
kind: Website
metadata:
  name: my-first-website
  namespace: default
spec:
  title: "Hello from Kubernetes!"
  content: |
    <p>This website is managed by a Kubernetes Operator.</p>
    <p>Try updating this resource and watch the content change automatically!</p>
  replicas: 2

Apply and verify:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Create the website
kubectl apply -f config/samples/webapp_v1_website.yaml

# Check the website resource
kubectl get websites
# NAME               URL                                              READY   REPLICAS
# my-first-website   http://my-first-website.default.svc.cluster.local   true    2

# Check created resources
kubectl get deploy,svc,configmap -l website=my-first-website

# Port-forward to see the website
kubectl port-forward svc/my-first-website 8080:80
# Visit http://localhost:8080

Key Concepts Explained

Owner References and Garbage Collection

When the Website custom resource is deleted, Kubernetes automatically cleans up all owned resources:

1
2
3
if err := ctrl.SetControllerReference(website, deployment, r.Scheme); err != nil {
    return err
}

Watching Owned Resources

The controller watches not just Website resources, but also Deployments, ConfigMaps, and Services it owns:

1
2
3
4
5
6
return ctrl.NewControllerManagedBy(mgr).
    For(&webappv1.Website{}).      // Primary resource
    Owns(&appsv1.Deployment{}).    // Owned resources
    Owns(&corev1.ConfigMap{}).
    Owns(&corev1.Service{}).
    Complete(r)

If someone manually modifies the Deployment, the controller will reconcile it back to the desired state.

Idempotent Reconciliation

The reconcile function must be idempotent—running it multiple times produces the same result:

1
2
3
4
5
6
7
existing := &appsv1.Deployment{}
err := r.Get(ctx, types.NamespacedName{Name: deployment.Name, Namespace: deployment.Namespace}, existing)
if errors.IsNotFound(err) {
    return r.Create(ctx, deployment)  // Create if missing
}
existing.Spec = deployment.Spec
return r.Update(ctx, existing)         // Update if exists

Best Practices for Production Operators

1. Add Finalizers for Cleanup

If your operator creates external resources (cloud infrastructure, DNS records), use finalizers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
const websiteFinalizer = "webapp.example.com/finalizer"

func (r *WebsiteReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    var website webappv1.Website
    if err := r.Get(ctx, req.NamespacedName, &website); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Check if being deleted
    if !website.DeletionTimestamp.IsZero() {
        if controllerutil.ContainsFinalizer(&website, websiteFinalizer) {
            // Perform cleanup
            if err := r.cleanupExternalResources(ctx, &website); err != nil {
                return ctrl.Result{}, err
            }
            // Remove finalizer
            controllerutil.RemoveFinalizer(&website, websiteFinalizer)
            return ctrl.Result{}, r.Update(ctx, &website)
        }
        return ctrl.Result{}, nil
    }

    // Add finalizer if not present
    if !controllerutil.ContainsFinalizer(&website, websiteFinalizer) {
        controllerutil.AddFinalizer(&website, websiteFinalizer)
        if err := r.Update(ctx, &website); err != nil {
            return ctrl.Result{}, err
        }
    }

    // Continue with normal reconciliation...
}

2. Implement Status Conditions

Use conditions for detailed status reporting:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
type WebsiteStatus struct {
    Conditions []metav1.Condition `json:"conditions,omitempty"`
}

// In reconciler
meta.SetStatusCondition(&website.Status.Conditions, metav1.Condition{
    Type:               "Ready",
    Status:             metav1.ConditionTrue,
    Reason:             "DeploymentReady",
    Message:            "All replicas are available",
    LastTransitionTime: metav1.Now(),
})

3. Add Prometheus Metrics

Monitor your operator’s health:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import (
    "github.com/prometheus/client_golang/prometheus"
    "sigs.k8s.io/controller-runtime/pkg/metrics"
)

var (
    reconcileTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "website_reconcile_total",
            Help: "Total number of reconciliations",
        },
        []string{"result"},
    )
)

func init() {
    metrics.Registry.MustRegister(reconcileTotal)
}

4. Rate Limiting and Backoff

Handle transient failures gracefully:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
func (r *WebsiteReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // ... reconciliation logic ...
    
    if err != nil {
        // Requeue with exponential backoff
        return ctrl.Result{RequeueAfter: time.Second * 30}, err
    }
    
    // Periodic reconciliation for drift detection
    return ctrl.Result{RequeueAfter: time.Minute * 5}, nil
}

Conclusion

Building Kubernetes operators with Go unlocks powerful automation capabilities:

  • Encode operational knowledge into code that runs 24/7
  • Self-healing applications that recover from failures automatically
  • Declarative management that integrates with GitOps workflows
  • Native Kubernetes integration with proper garbage collection and status reporting

The example in this post is simple, but the same patterns apply to complex operators managing databases, message queues, ML training jobs, or multi-cluster deployments.

At Sajima Solutions, we help organizations build custom operators that automate their infrastructure operations. Contact us to discuss how operators can reduce your operational burden.