Rancher certificates are actually just standard TLS certificates, and they’re managed by cert-manager.

Here’s how to rotate them before they expire, which is crucial for maintaining secure access to your Rancher deployment.

Understanding Rancher’s Certificate Management

Rancher, when deployed with its own ingress, relies on cert-manager to issue and manage TLS certificates. This is typically done for the primary Rancher API endpoint. If you’re using an external load balancer or ingress controller that manages its own certificates, this guide might not directly apply to your Rancher certificate rotation, but the principles of cert-manager are the same.

The key component here is cert-manager, a Kubernetes add-on that automates certificate management. It watches Certificate resources and ensures that the corresponding Secret objects containing the actual TLS certificates are kept up-to-date.

Identifying Your Certificates

First, you need to know which certificates cert-manager is managing for Rancher. The most common scenario is the certificate for the Rancher ingress.

  1. List Certificate resources:

    kubectl get certificates -n cattle-system
    

    You’ll likely see a certificate named something like rancher-ingress-tls.

  2. Check the Secret:

    kubectl get secret -n cattle-system <certificate-name> -o yaml
    

    Replace <certificate-name> with the name of the certificate you found. Look for the data field; it should contain tls.crt and tls.key. The creation and update timestamps of this secret are important indicators.

Common Causes for Certificate Expiration Issues

  1. Default Renewal Interval Too Short/Long: cert-manager has a default renewal interval. If your certificate’s Not After date is too close to the renewal window, it might not renew in time. Conversely, if the interval is too long, you might miss the window.

    • Diagnosis: Examine the Certificate resource’s YAML. Look for spec.renewBefore. If it’s not set, cert-manager uses its default (usually 30 days before expiry).
    • Fix: Edit the Certificate resource to set a more appropriate renewBefore value. For a 90-day certificate, setting renewBefore: 30d is common.
      spec:
        dnsNames:
        - rancher.yourdomain.com
        issuerRef:
          kind: ClusterIssuer
          name: letsencrypt-prod
        secretName: rancher-ingress-tls
        renewBefore: 30d # Add or adjust this line
      
      Apply with kubectl apply -f your-certificate.yaml. This tells cert-manager to start renewal 30 days before expiration.
  2. cert-manager Pods Not Running or Crashing: If the cert-manager controller pods are down, they can’t monitor or renew certificates.

    • Diagnosis:
      kubectl get pods -n cert-manager
      kubectl logs <cert-manager-pod-name> -n cert-manager
      
      Look for any restarts or error messages indicating issues with scheduling, image pulling, or internal errors.
    • Fix: If pods are crashing, check the logs for specific errors. Often, this is due to resource constraints (CPU/memory) or admission controller issues. Ensure the cert-manager namespace has sufficient resources allocated and that its ServiceAccount has the necessary RBAC permissions. Restarting the pods might be sufficient if it was a transient issue: kubectl delete pod <cert-manager-pod-name> -n cert-manager.
  3. Incorrect ClusterIssuer or Issuer Configuration: The ClusterIssuer (or Issuer for namespaced certificates) defines how certificates are obtained. If it’s misconfigured (e.g., wrong ACME account, invalid email, incorrect provider), renewals will fail.

    • Diagnosis:
      kubectl describe clusterissuer <issuer-name>
      kubectl get challenges -n cattle-system # If using ACME/LetsEncrypt
      kubectl get orders -n cattle-system
      
      Look for Events in the describe output for the issuer, and check the status of challenges and orders for ACME-related issues.
    • Fix: Correct the spec of your ClusterIssuer. For Let’s Encrypt, ensure the email is valid and the privateKeySecretRef exists and is accessible.
      # Example ClusterIssuer for Let's Encrypt staging (for testing)
      apiVersion: cert-manager.io/v1
      kind: ClusterIssuer
      metadata:
        name: letsencrypt-staging
      spec:
        acme:
          server: https://acme-staging-v02.api.letsencrypt.org/directory
          email: your-email@example.com # Ensure this is correct
          privateKeySecretRef:
            name: letsencrypt-staging-private-key # Ensure this Secret exists
          solvers:
          - http01:
              ingress: nginx # Or your ingress controller
      
      Apply with kubectl apply -f your-issuer.yaml.
  4. Ingress Controller Issues: If cert-manager is using an HTTP01 challenge, the ingress controller must be running and correctly configured to route challenges to cert-manager. If the ingress controller itself is having issues, certificate validation will fail.

    • Diagnosis:
      kubectl get pods -n <ingress-controller-namespace> # e.g., ingress-nginx
      kubectl describe ingress <rancher-ingress-name> -n cattle-system
      
      Check ingress controller logs for errors related to routing or TLS termination. Ensure the Rancher ingress resource is correctly defined and points to the Rancher service.
    • Fix: Resolve issues with the ingress controller (e.g., resource limits, configuration errors, backend service availability). Ensure the ingressClassName in your Rancher ingress matches the one your controller watches.
  5. Rate Limiting by Certificate Authority (CA): If using Let’s Encrypt, there are limits on how many certificates you can issue per domain per week. If certificates expire and are re-issued too frequently, you can hit these limits.

    • Diagnosis: Check cert-manager logs and kubectl describe challenges for messages indicating rate limiting. The CA’s API responses will usually mention this.
    • Fix: For testing, switch to the Let’s Encrypt staging environment (acme-staging-v02.api.letsencrypt.org/directory) to avoid hitting production rate limits. Once resolved, ensure your renewBefore setting is appropriate so renewals happen well before expiration, minimizing the need for emergency re-issuance. For production, you’ll need to wait for the rate limit window to reset.
  6. Secret Data Corrupted or Deleted: In rare cases, the Kubernetes Secret holding the certificate and key might become corrupted or accidentally deleted. cert-manager relies on this Secret to know the current certificate’s validity period.

    • Diagnosis:
      kubectl get secret -n cattle-system rancher-ingress-tls -o yaml
      
      Check if the data field is present and contains valid base64 encoded tls.crt and tls.key. If the secret is missing, cert-manager will attempt to recreate it.
    • Fix: If the secret is missing, cert-manager should automatically recreate it based on the Certificate resource. If it’s corrupted, cert-manager might also attempt to fix it, but sometimes manually deleting the malformed secret (kubectl delete secret -n cattle-system rancher-ingress-tls) can prompt cert-manager to regenerate it correctly.

The Next Problem

After successfully rotating your certificates, the next error you might encounter is related to cert-manager’s own certificate, which it uses for its internal API and webhook. This certificate also expires and needs to be managed.

Want structured learning?

Take the full Rancher course →