Rancher certificates are actually just standard TLS certificates, and they’re managed by cert-manager.
Here’s how to rotate them before they expire, which is crucial for maintaining secure access to your Rancher deployment.
Understanding Rancher’s Certificate Management
Rancher, when deployed with its own ingress, relies on cert-manager to issue and manage TLS certificates. This is typically done for the primary Rancher API endpoint. If you’re using an external load balancer or ingress controller that manages its own certificates, this guide might not directly apply to your Rancher certificate rotation, but the principles of cert-manager are the same.
The key component here is cert-manager, a Kubernetes add-on that automates certificate management. It watches Certificate resources and ensures that the corresponding Secret objects containing the actual TLS certificates are kept up-to-date.
Identifying Your Certificates
First, you need to know which certificates cert-manager is managing for Rancher. The most common scenario is the certificate for the Rancher ingress.
-
List
Certificateresources:kubectl get certificates -n cattle-systemYou’ll likely see a certificate named something like
rancher-ingress-tls. -
Check the
Secret:kubectl get secret -n cattle-system <certificate-name> -o yamlReplace
<certificate-name>with the name of the certificate you found. Look for thedatafield; it should containtls.crtandtls.key. The creation and update timestamps of this secret are important indicators.
Common Causes for Certificate Expiration Issues
-
Default Renewal Interval Too Short/Long:
cert-managerhas a default renewal interval. If your certificate’sNot Afterdate is too close to the renewal window, it might not renew in time. Conversely, if the interval is too long, you might miss the window.- Diagnosis: Examine the
Certificateresource’s YAML. Look forspec.renewBefore. If it’s not set,cert-manageruses its default (usually 30 days before expiry). - Fix: Edit the
Certificateresource to set a more appropriaterenewBeforevalue. For a 90-day certificate, settingrenewBefore: 30dis common.
Apply withspec: dnsNames: - rancher.yourdomain.com issuerRef: kind: ClusterIssuer name: letsencrypt-prod secretName: rancher-ingress-tls renewBefore: 30d # Add or adjust this linekubectl apply -f your-certificate.yaml. This tellscert-managerto start renewal 30 days before expiration.
- Diagnosis: Examine the
-
cert-managerPods Not Running or Crashing: If thecert-managercontroller pods are down, they can’t monitor or renew certificates.- Diagnosis:
Look for any restarts or error messages indicating issues with scheduling, image pulling, or internal errors.kubectl get pods -n cert-manager kubectl logs <cert-manager-pod-name> -n cert-manager - Fix: If pods are crashing, check the logs for specific errors. Often, this is due to resource constraints (CPU/memory) or admission controller issues. Ensure the
cert-managernamespace has sufficient resources allocated and that its ServiceAccount has the necessary RBAC permissions. Restarting the pods might be sufficient if it was a transient issue:kubectl delete pod <cert-manager-pod-name> -n cert-manager.
- Diagnosis:
-
Incorrect
ClusterIssuerorIssuerConfiguration: TheClusterIssuer(orIssuerfor namespaced certificates) defines how certificates are obtained. If it’s misconfigured (e.g., wrong ACME account, invalid email, incorrect provider), renewals will fail.- Diagnosis:
Look forkubectl describe clusterissuer <issuer-name> kubectl get challenges -n cattle-system # If using ACME/LetsEncrypt kubectl get orders -n cattle-systemEventsin thedescribeoutput for the issuer, and check the status ofchallengesandordersfor ACME-related issues. - Fix: Correct the
specof yourClusterIssuer. For Let’s Encrypt, ensure theemailis valid and theprivateKeySecretRefexists and is accessible.
Apply with# Example ClusterIssuer for Let's Encrypt staging (for testing) apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-staging spec: acme: server: https://acme-staging-v02.api.letsencrypt.org/directory email: your-email@example.com # Ensure this is correct privateKeySecretRef: name: letsencrypt-staging-private-key # Ensure this Secret exists solvers: - http01: ingress: nginx # Or your ingress controllerkubectl apply -f your-issuer.yaml.
- Diagnosis:
-
Ingress Controller Issues: If
cert-manageris using an HTTP01 challenge, the ingress controller must be running and correctly configured to route challenges tocert-manager. If the ingress controller itself is having issues, certificate validation will fail.- Diagnosis:
Check ingress controller logs for errors related to routing or TLS termination. Ensure the Rancher ingress resource is correctly defined and points to the Rancher service.kubectl get pods -n <ingress-controller-namespace> # e.g., ingress-nginx kubectl describe ingress <rancher-ingress-name> -n cattle-system - Fix: Resolve issues with the ingress controller (e.g., resource limits, configuration errors, backend service availability). Ensure the
ingressClassNamein your Rancher ingress matches the one your controller watches.
- Diagnosis:
-
Rate Limiting by Certificate Authority (CA): If using Let’s Encrypt, there are limits on how many certificates you can issue per domain per week. If certificates expire and are re-issued too frequently, you can hit these limits.
- Diagnosis: Check
cert-managerlogs andkubectl describe challengesfor messages indicating rate limiting. The CA’s API responses will usually mention this. - Fix: For testing, switch to the Let’s Encrypt staging environment (
acme-staging-v02.api.letsencrypt.org/directory) to avoid hitting production rate limits. Once resolved, ensure yourrenewBeforesetting is appropriate so renewals happen well before expiration, minimizing the need for emergency re-issuance. For production, you’ll need to wait for the rate limit window to reset.
- Diagnosis: Check
-
SecretData Corrupted or Deleted: In rare cases, the KubernetesSecretholding the certificate and key might become corrupted or accidentally deleted.cert-managerrelies on thisSecretto know the current certificate’s validity period.- Diagnosis:
Check if thekubectl get secret -n cattle-system rancher-ingress-tls -o yamldatafield is present and contains valid base64 encodedtls.crtandtls.key. If the secret is missing,cert-managerwill attempt to recreate it. - Fix: If the secret is missing,
cert-managershould automatically recreate it based on theCertificateresource. If it’s corrupted,cert-managermight also attempt to fix it, but sometimes manually deleting the malformed secret (kubectl delete secret -n cattle-system rancher-ingress-tls) can promptcert-managerto regenerate it correctly.
- Diagnosis:
The Next Problem
After successfully rotating your certificates, the next error you might encounter is related to cert-manager’s own certificate, which it uses for its internal API and webhook. This certificate also expires and needs to be managed.