The Rancher API server’s webhook certificate has expired, preventing it from communicating with the Kubernetes API server for critical operations like admission control.

Common Causes and Fixes:

  1. Expired rancher-webhook certificate in the cattle-system namespace.

    • Diagnosis:
      kubectl get secret -n cattle-system rancher-webhook -o jsonpath='{.data.tls\.crt}' | base64 --decode | openssl x509 -noout -dates
      
      This command will output notBefore=<date> and notAfter=<date>. If notAfter is in the past, the certificate is expired.
    • Fix: The Rancher webhook certificate is automatically managed by Rancher’s cert-manager. If it’s expired, it usually indicates an issue with the cert-manager deployment itself or its ability to renew. The most straightforward fix is to restart the cert-manager pods in the cattle-system namespace.
      kubectl delete pod -n cattle-system -l app=cert-manager
      
      This forces cert-manager to restart and attempt to re-issue or renew the rancher-webhook certificate. The new certificate will be automatically mounted into the rancher-webhook deployment.
    • Why it works: Restarting cert-manager prompts it to re-evaluate its existing certificate resources. If the rancher-webhook certificate is marked as expired or nearing expiration, cert-manager will trigger a renewal process, generating a new valid certificate and updating the corresponding secret.
  2. Expired rancher-webhook certificate in the kube-system namespace (older Rancher versions or specific configurations).

    • Diagnosis:
      kubectl get secret -n kube-system rancher-webhook-tls -o jsonpath='{.data.tls\.crt}' | base64 --decode | openssl x509 -noout -dates
      
      Similar to the above, check the notAfter date.
    • Fix: In some older or custom configurations, the webhook certificate might reside in kube-system. The renewal mechanism is the same: restart cert-manager.
      kubectl delete pod -n kube-system -l app=cert-manager
      
      Ensure you are targeting the correct namespace where cert-manager and the webhook secret are deployed.
    • Why it works: Same as above, restarting cert-manager in the relevant namespace ensures it attempts to renew the certificate it’s responsible for.
  3. Cert-manager controller not running or unhealthy.

    • Diagnosis:
      kubectl get pods -n cattle-system -l app=cert-manager
      
      Look for pods that are CrashLoopBackOff, Error, or Pending.
      kubectl logs -n cattle-system <cert-manager-pod-name>
      
      Examine logs for recurring errors, particularly related to certificate issuance or renewal.
    • Fix: If cert-manager pods are unhealthy, investigate the underlying cause. This might involve checking resource limits, RBAC permissions, or dependencies. Often, simply deleting and letting Kubernetes reschedule the pods is enough if it was a transient issue. If persistent, redeploying cert-manager might be necessary.
      kubectl delete pod -n cattle-system -l app=cert-manager
      
      If the issue is more complex, you might need to consult cert-manager documentation for troubleshooting its controller.
    • Why it works: Cert-manager is the component responsible for automating certificate lifecycle management. If it’s not running correctly, it cannot renew the webhook certificate, leading to expiration. Restoring its health allows it to perform its duties.
  4. Incorrect CA bundle configured in the Kubernetes API server.

    • Diagnosis:
      kubectl get mutatingwebhookconfigurations,validatingwebhookconfigurations -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.webhooks[*].clientConfig.caBundle}{"\n"}{end}'
      
      This command attempts to list webhook configurations and their CA bundles. The caBundle is base64 encoded. You’d need to decode and compare it against the CA certificate used by Rancher’s webhook. A more direct check is to inspect the rancher-webhook secret.
      kubectl get secret -n cattle-system rancher-webhook -o jsonpath='{.data.ca\.crt}' | base64 --decode > rancher_webhook_ca.pem
      kubectl get clusterissuers -n cattle-system rancher-ca-rancher -o jsonpath='{.spec.ca`}' | base64 --decode > expected_ca.pem
      diff rancher_webhook_ca.pem expected_ca.pem
      
      If these CA bundles don’t match, the API server doesn’t trust the webhook’s certificate.
    • Fix: The CA bundle is typically managed by Rancher. If it’s mismatched, it often points to a corrupted or improperly updated Rancher installation. A full Rancher upgrade or reinstallation might be required to ensure the CA and webhook certificates are correctly configured and trusted by the Kubernetes API server.
    • Why it works: The Kubernetes API server uses the CA bundle specified in webhook configurations to verify the authenticity of certificates presented by webhooks. If this CA bundle is incorrect or outdated, the API server will reject connections from the webhook, leading to operational failures.
  5. Rancher server not restarted after certificate renewal.

    • Diagnosis: Check the Rancher server logs for errors related to webhook communication or certificate validation.
      kubectl logs -n cattle-system -l app=rancher
      
    • Fix: In some scenarios, even after the certificate is renewed, the Rancher API server pods might need to be restarted to pick up the new certificate.
      kubectl delete pod -n cattle-system -l app=rancher
      
    • Why it works: The Rancher API server caches its TLS configuration, including certificates. A restart forces it to reload its configuration and load the newly issued webhook certificate.
  6. Network policies blocking communication between Rancher webhook and Kubernetes API server.

    • Diagnosis:
      kubectl get networkpolicies -A
      
      Examine network policies in cattle-system and kube-system namespaces that might restrict egress from rancher-webhook pods or ingress to the Kubernetes API server (typically kubernetes.default.svc.cluster.local on port 443).
    • Fix: If a network policy is found to be blocking traffic, adjust its rules to allow communication. For example, to allow egress from the webhook to the Kubernetes API server:
      apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      metadata:
        name: allow-rancher-webhook-to-api
        namespace: cattle-system
      spec:
        podSelector:
          matchLabels:
            app: rancher-webhook
        policyTypes:
        - Egress
        egress:
        - to:
          - ipBlock:
              cidr: <kubernetes-api-server-cidr> # e.g., 10.43.0.0/16 for default service CIDR
          ports:
          - protocol: TCP
            port: 443
      
      Replace <kubernetes-api-server-cidr> with the actual CIDR of your Kubernetes service IP range.
    • Why it works: Network policies enforce traffic segmentation within the cluster. If a policy is too restrictive, it can prevent the webhook from reaching the API server, even if the certificates are valid. Correcting the policy restores necessary network connectivity.

After resolving the webhook certificate expiration, you might encounter errors related to the Rancher UI itself failing to load if it was also relying on the webhook for certain operations.

Want structured learning?

Take the full Rancher course →