Rancher’s upgrade process is designed to be surprisingly resilient, allowing you to update your control plane without taking your Kubernetes clusters offline.

Let’s watch a simulated upgrade in action. Imagine we have a Rancher server running on a Kubernetes cluster, and we want to upgrade it from version v2.7.0 to v2.7.1.

# Before the upgrade, check the current Rancher version
kubectl get pods -n cattle-system -l app=rancher
# NAME                     READY   STATUS    RESTARTS   AGE
# rancher-xxxxxxxxxx-xxxxx   1/1     Running   0          24h

# (Simulated) Output shows Rancher is running.
# Now, let's apply the new manifest.
# This would typically be done by updating the Helm chart values and running `helm upgrade`.
# For this example, we'll simulate by applying a modified YAML.

# Assume a new Rancher deployment YAML is prepared with updated image tags.
# The key is that the deployment object for Rancher will be updated.
kubectl apply -f rancher-v2.7.1-deployment.yaml

The magic behind zero-downtime upgrades in Rancher lies in its deployment strategy and how it interacts with Kubernetes. Rancher is deployed as a Kubernetes Deployment. Kubernetes Deployments, by default, use a RollingUpdate strategy. This strategy ensures that when you update the pod template (e.g., by changing the container image), Kubernetes gradually replaces old pods with new ones.

Here’s how the RollingUpdate strategy works for Rancher:

  1. New Pods Start: Kubernetes starts creating new pods based on the updated deployment configuration.
  2. Readiness Probes: Rancher pods have readiness probes configured. A new pod is only considered "ready" and ready to receive traffic once it passes these probes, indicating that the Rancher application inside is fully initialized and functional.
  3. Old Pods Terminate: Once a new pod is ready, Kubernetes terminates an old pod.
  4. Controlled Rollout: This process continues, ensuring that at any given time, a minimum number of pods are available (maxUnavailable) and that the total number of pods doesn’t exceed the desired count plus a buffer (maxSurge).

The cattle-system namespace, where Rancher is installed, typically has multiple replicas of the Rancher pod. For instance, a default installation might have 3 replicas.

# Excerpt from a Kubernetes Deployment spec for Rancher
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rancher
  namespace: cattle-system
spec:
  replicas: 3 # We have multiple instances of Rancher
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1 # At least 2 pods will be available during update
      maxSurge: 1      # Can have up to 4 pods briefly
  template:
    # ... pod definition with image: rancher/rancher:v2.7.1

This configuration ensures that even when some Rancher pods are being replaced with the new version, at least two of the original pods remain running and available to serve requests. The Kubernetes Service object, also in cattle-system, acts as a load balancer, distributing traffic only to the ready pods. As new Rancher pods come online and become ready, they are added to the Service’s endpoints, and old, terminating pods are removed.

The critical levers you control during an upgrade are the maxUnavailable and maxSurge parameters in the Deployment’s strategy. maxUnavailable defines how many pods can be down during the update. Setting it to 1 in a 3-replica deployment means you always have at least 2 pods running. maxSurge defines how many extra pods can be created above the desired replicas count. This allows for a faster rollout if desired, but can increase resource utilization temporarily.

The most surprising part of this zero-downtime mechanism is that it requires no special Rancher-specific logic for the upgrade itself; it’s entirely a function of Kubernetes’ built-in Deployment rollout capabilities. Rancher simply leverages standard Kubernetes features.

After the upgrade, you’ll typically see all pods in the cattle-system namespace running the new image.

# After the upgrade, check the Rancher pods again
kubectl get pods -n cattle-system -l app=rancher
# NAME                     READY   STATUS    RESTARTS   AGE
# rancher-abcdefghij-klmno   1/1     Running   0          5m
# rancher-pqrstuvw-xyzab     1/1     Running   0          5m
# rancher-zyxwvu-tsrqpo      1/1     Running   0          5m

# (Simulated) Output shows all pods are now running the new version.

The next common issue you might encounter after a successful Rancher upgrade is related to webhook configurations not being updated immediately if they rely on specific Rancher API endpoints that might have changed.

Want structured learning?

Take the full Rancher course →