Python Kubernetes Operators: Build Custom Controllers (2026)

The most surprising thing about Python Kubernetes Operators is that they often don’t run Kubernetes at all in the traditional sense; instead, they run alongside it, observing and reacting to its state.

Let’s see one in action. Imagine we have a simple application that needs a persistent volume. Normally, you’d manually create a PersistentVolumeClaim (PVC) and a Deployment. An operator can automate this.

Here’s a simplified Python operator using kopf, a popular framework:

import kopf
import kubernetes
import yaml

@kopf.on.create('mycompany.com/v1', 'myapps')
def create_app_resources(spec, name, namespace, logger, **kwargs):
    logger.info(f"Creating resources for app '{name}' in namespace '{namespace}'")

    # Define the PersistentVolumeClaim
    pvc_name = f"{name}-data"
    pvc_manifest = {
        'apiVersion': 'v1',
        'kind': 'PersistentVolumeClaim',
        'metadata': {'name': pvc_name, 'namespace': namespace},
        'spec': {
            'accessModes': ['ReadWriteOnce'],
            'resources': {'requests': {'storage': '1Gi'}}
        }
    }

    # Define the Deployment
    deployment_name = name
    deployment_manifest = {
        'apiVersion': 'apps/v1',
        'kind': 'Deployment',
        'metadata': {'name': deployment_name, 'namespace': namespace},
        'spec': {
            'replicas': 1,
            'selector': {'matchLabels': {'app': name}},
            'template': {
                'metadata': {'labels': {'app': name}},
                'spec': {
                    'containers': [
                        {
                            'name': 'app-container',
                            'image': spec.get('image', 'nginx:latest'), # Get image from CR spec
                            'ports': [{'containerPort': 80}],
                            'volumeMounts': [
                                {
                                    'name': 'data-volume',
                                    'mountPath': '/usr/share/nginx/html'
                                }
                            ]
                        }
                    ],
                    'volumes': [
                        {
                            'name': 'data-volume',
                            'persistentVolumeClaim': {'claimName': pvc_name}
                        }
                    ]
                }
            }
        }
    }

    # Use Kubernetes API to create resources
    core_v1 = kubernetes.client.CoreV1Api()
    apps_v1 = kubernetes.client.AppsV1Api()

    try:
        core_v1.create_namespaced_persistent_volume_claim(namespace, pvc_manifest)
        logger.info(f"Created PVC: {pvc_name}")
    except kubernetes.client.ApiException as e:
        if e.status == 409: # Already exists
            logger.warning(f"PVC {pvc_name} already exists.")
        else:
            raise

    try:
        apps_v1.create_namespaced_deployment(namespace, deployment_manifest)
        logger.info(f"Created Deployment: {deployment_name}")
    except kubernetes.client.ApiException as e:
        if e.status == 409: # Already exists
            logger.warning(f"Deployment {deployment_name} already exists.")
        else:
            raise

    return {'message': f"App {name} resources created successfully."}

When you apply a Custom Resource (CR) like this:

apiVersion: mycompany.com/v1
kind: MyApp
metadata:
  name: my-web-app
  namespace: default
spec:
  image: nginx:1.21

The kopf operator, running in a pod within your cluster, intercepts the create event for MyApp objects. It then uses the Kubernetes Python client to create a PersistentVolumeClaim named my-web-app-data and a Deployment named my-web-app.

The core problem this solves is managing complex, multi-resource Kubernetes objects as a single logical unit. Instead of users having to understand and manually orchestrate PVC, Deployment, Service, Ingress, etc., they interact with a single MyApp custom resource. The operator handles the translation and reconciliation.

Internally, the operator works by watching Kubernetes API resources. kopf (and other operator SDKs) abstract away the low-level watching mechanism. When a MyApp object is created, updated, or deleted, the Kubernetes API server sends an event to the operator. The operator then executes the decorated Python function (create_app_resources in this case).

The operator’s reconciliation loop is key. If the desired state (defined by the MyApp CR) doesn’t match the actual state in the cluster, the operator attempts to bring them into alignment. For instance, if the Deployment managed by the operator is deleted manually, the operator, upon detecting the drift, would recreate it.

The spec in the Python code directly maps to the spec field in your MyApp CR. This is how you parameterize your custom resources. The name and namespace are automatically provided by kopf, derived from the metadata of the CR.

The kubernetes.client library is the standard way to interact with the Kubernetes API from Python. You initialize API clients (like CoreV1Api for PVCs, AppsV1Api for Deployments) and then call methods to create, read, update, or delete Kubernetes objects. Error handling, especially for 409 Conflict (meaning the resource already exists), is crucial for robust operators.

The "CRD" (Custom Resource Definition) is what tells Kubernetes about your new API object (MyApp in this case). It defines the schema and API group/version. The operator code then implements the logic for that new API object.

A common pitfall is forgetting to handle all possible states or race conditions. For example, if the PVC creation fails, the deployment creation might proceed and then fail. A more advanced operator would track the status of these sub-resources and report it back to the MyApp CR’s status field.

The next concept you’ll likely run into is handling updates and deletions of your custom resources, which involves defining other kopf event handlers like @kopf.on.update and @kopf.on.delete.