The Rancher API server’s webhook certificate has expired, preventing it from communicating with the Kubernetes API server for critical operations like admission control.
Common Causes and Fixes:
-
Expired
rancher-webhookcertificate in thecattle-systemnamespace.- Diagnosis:
This command will outputkubectl get secret -n cattle-system rancher-webhook -o jsonpath='{.data.tls\.crt}' | base64 --decode | openssl x509 -noout -datesnotBefore=<date>andnotAfter=<date>. IfnotAfteris in the past, the certificate is expired. - Fix: The Rancher webhook certificate is automatically managed by Rancher’s cert-manager. If it’s expired, it usually indicates an issue with the cert-manager deployment itself or its ability to renew. The most straightforward fix is to restart the
cert-managerpods in thecattle-systemnamespace.
This forces cert-manager to restart and attempt to re-issue or renew thekubectl delete pod -n cattle-system -l app=cert-managerrancher-webhookcertificate. The new certificate will be automatically mounted into therancher-webhookdeployment. - Why it works: Restarting cert-manager prompts it to re-evaluate its existing certificate resources. If the
rancher-webhookcertificate is marked as expired or nearing expiration, cert-manager will trigger a renewal process, generating a new valid certificate and updating the corresponding secret.
- Diagnosis:
-
Expired
rancher-webhookcertificate in thekube-systemnamespace (older Rancher versions or specific configurations).- Diagnosis:
Similar to the above, check thekubectl get secret -n kube-system rancher-webhook-tls -o jsonpath='{.data.tls\.crt}' | base64 --decode | openssl x509 -noout -datesnotAfterdate. - Fix: In some older or custom configurations, the webhook certificate might reside in
kube-system. The renewal mechanism is the same: restart cert-manager.
Ensure you are targeting the correct namespace wherekubectl delete pod -n kube-system -l app=cert-managercert-managerand the webhook secret are deployed. - Why it works: Same as above, restarting cert-manager in the relevant namespace ensures it attempts to renew the certificate it’s responsible for.
- Diagnosis:
-
Cert-manager controller not running or unhealthy.
- Diagnosis:
Look for pods that arekubectl get pods -n cattle-system -l app=cert-managerCrashLoopBackOff,Error, orPending.
Examine logs for recurring errors, particularly related to certificate issuance or renewal.kubectl logs -n cattle-system <cert-manager-pod-name> - Fix: If cert-manager pods are unhealthy, investigate the underlying cause. This might involve checking resource limits, RBAC permissions, or dependencies. Often, simply deleting and letting Kubernetes reschedule the pods is enough if it was a transient issue. If persistent, redeploying cert-manager might be necessary.
If the issue is more complex, you might need to consult cert-manager documentation for troubleshooting its controller.kubectl delete pod -n cattle-system -l app=cert-manager - Why it works: Cert-manager is the component responsible for automating certificate lifecycle management. If it’s not running correctly, it cannot renew the webhook certificate, leading to expiration. Restoring its health allows it to perform its duties.
- Diagnosis:
-
Incorrect CA bundle configured in the Kubernetes API server.
- Diagnosis:
This command attempts to list webhook configurations and their CA bundles. Thekubectl get mutatingwebhookconfigurations,validatingwebhookconfigurations -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.webhooks[*].clientConfig.caBundle}{"\n"}{end}'caBundleis base64 encoded. You’d need to decode and compare it against the CA certificate used by Rancher’s webhook. A more direct check is to inspect therancher-webhooksecret.
If these CA bundles don’t match, the API server doesn’t trust the webhook’s certificate.kubectl get secret -n cattle-system rancher-webhook -o jsonpath='{.data.ca\.crt}' | base64 --decode > rancher_webhook_ca.pem kubectl get clusterissuers -n cattle-system rancher-ca-rancher -o jsonpath='{.spec.ca`}' | base64 --decode > expected_ca.pem diff rancher_webhook_ca.pem expected_ca.pem - Fix: The CA bundle is typically managed by Rancher. If it’s mismatched, it often points to a corrupted or improperly updated Rancher installation. A full Rancher upgrade or reinstallation might be required to ensure the CA and webhook certificates are correctly configured and trusted by the Kubernetes API server.
- Why it works: The Kubernetes API server uses the CA bundle specified in webhook configurations to verify the authenticity of certificates presented by webhooks. If this CA bundle is incorrect or outdated, the API server will reject connections from the webhook, leading to operational failures.
- Diagnosis:
-
Rancher server not restarted after certificate renewal.
- Diagnosis: Check the Rancher server logs for errors related to webhook communication or certificate validation.
kubectl logs -n cattle-system -l app=rancher - Fix: In some scenarios, even after the certificate is renewed, the Rancher API server pods might need to be restarted to pick up the new certificate.
kubectl delete pod -n cattle-system -l app=rancher - Why it works: The Rancher API server caches its TLS configuration, including certificates. A restart forces it to reload its configuration and load the newly issued webhook certificate.
- Diagnosis: Check the Rancher server logs for errors related to webhook communication or certificate validation.
-
Network policies blocking communication between Rancher webhook and Kubernetes API server.
- Diagnosis:
Examine network policies inkubectl get networkpolicies -Acattle-systemandkube-systemnamespaces that might restrict egress fromrancher-webhookpods or ingress to the Kubernetes API server (typicallykubernetes.default.svc.cluster.localon port 443). - Fix: If a network policy is found to be blocking traffic, adjust its rules to allow communication. For example, to allow egress from the webhook to the Kubernetes API server:
ReplaceapiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-rancher-webhook-to-api namespace: cattle-system spec: podSelector: matchLabels: app: rancher-webhook policyTypes: - Egress egress: - to: - ipBlock: cidr: <kubernetes-api-server-cidr> # e.g., 10.43.0.0/16 for default service CIDR ports: - protocol: TCP port: 443<kubernetes-api-server-cidr>with the actual CIDR of your Kubernetes service IP range. - Why it works: Network policies enforce traffic segmentation within the cluster. If a policy is too restrictive, it can prevent the webhook from reaching the API server, even if the certificates are valid. Correcting the policy restores necessary network connectivity.
- Diagnosis:
After resolving the webhook certificate expiration, you might encounter errors related to the Rancher UI itself failing to load if it was also relying on the webhook for certain operations.