Rotating cloud credentials for Rancher, especially in a production environment, is often dreaded due to the potential for downtime.

Here’s how to do it seamlessly.

The Core Problem: How Rancher Uses Cloud Credentials

Rancher uses cloud credentials primarily for two things:

  1. Cloud Provider Integrations: When you provision RKE (Rancher Kubernetes Engine) clusters using Rancher, it uses your cloud provider credentials to interact with AWS, Azure, GCP, etc. This includes creating/managing VPCs, security groups, load balancers, and instances.
  2. Node Drivers/Cluster Drivers: These allow Rancher to provision nodes for custom clusters (e.g., using Docker Machine) or deploy custom cluster types.

The critical point is that existing provisioned clusters usually don’t continuously use these credentials for their day-to-day operations once the cluster is up and running, unless you’re performing specific actions like scaling node pools managed by Rancher. However, any new provisioning or management operations will require valid credentials.

The Strategy: Phased Rotation

The key to zero downtime is a phased approach that ensures valid credentials are always available for any operation Rancher might attempt. This involves adding the new credentials before removing the old ones.

Step 1: Add New Cloud Credentials

First, you need to create a new set of cloud credentials with the updated access keys, secrets, or service principal details.

  1. Navigate to Global Settings: In your Rancher UI, go to Global -> Cluster Management.
  2. Access Cloud Credentials: On the left-hand menu, find and click on Cloud Credentials.
  3. Create New Credentials: Click the Create button.
  4. Select Provider: Choose your cloud provider (e.g., AWS, Azure, GCP).
  5. Enter New Details: Fill in the new access key ID, secret access key, region (for AWS), subscription ID, tenant ID, client ID, client secret (for Azure), or project ID, client email, private key (for GCP), etc.
  6. Name Appropriately: Give these new credentials a clear name, like aws-credentials-2024-07-new or azure-credentials-prod-v2.

Why this works: Rancher now has two sets of valid credentials available. Any new operations initiated will use the newly added set if they reference it, or it will be available for selection.

Step 2: Update Existing Cluster Configurations (If Applicable)

This is the most crucial step for ensuring no downtime for existing, Rancher-managed clusters. If your clusters were provisioned by Rancher using a specific set of cloud credentials and you want to ensure they continue to function for management tasks (like scaling node pools), you need to associate the new credentials with them.

  1. Go to Cluster Management: Navigate back to Global -> Cluster Management.
  2. Select Your Cluster: Click on the name of the cluster you want to update.
  3. Access Cluster Configuration: Look for an option like Edit or Configuration for the cluster. The exact path depends on the cluster type (RKE, imported, etc.).
    • For RKE Clusters: Navigate to the cluster’s Edit page. Scroll down to the Cloud Provider section. You should see a dropdown or selection for Cloud Credentials.
    • For Node Drivers: If you use custom node drivers for provisioning, go to Global -> Cluster Management -> Node Drivers. Edit the relevant node driver and update its associated cloud credentials.
  4. Switch to New Credentials: From the dropdown, select the newly added cloud credentials.
  5. Save Changes: Apply the changes. Rancher will update the cluster’s configuration to use the new credentials for future cloud provider interactions.

Why this works: This tells Rancher, "When you need to talk to the cloud for this specific cluster’s management operations (like scaling nodes, updating load balancers), use these new credentials." The cluster itself remains unaffected by this change.

Step 3: Verify New Credentials

Before removing the old credentials, it’s vital to verify that the new ones are working as expected.

  1. Attempt a Test Operation:
    • For RKE Clusters: Try scaling a node pool up or down by one instance. This is a low-risk operation that requires cloud interaction.
    • For New Provisioning: If you are about to provision a new cluster, select the new credentials during the provisioning wizard.
    • For Node Drivers: If you’ve updated a node driver, try creating a new node pool or cluster using that driver.
  2. Check Cluster Status: Ensure the operation completes successfully and your cluster remains healthy. Monitor the cluster events and logs for any errors.

Why this works: This confirms that the new credentials have the necessary permissions and are correctly configured in Rancher for the tasks you perform.

Step 4: Remove Old Cloud Credentials

Once you are confident that the new credentials are fully functional and have been applied to all relevant cluster configurations, you can safely remove the old ones.

  1. Navigate to Cloud Credentials: Go to Global -> Cluster Management -> Cloud Credentials.
  2. Select Old Credentials: Find the credentials you wish to remove.
  3. Delete: Click the Delete button (or the three dots menu and select Delete). Confirm the action.

Why this works: This cleans up your environment and removes the old, potentially compromised credentials. Rancher will now exclusively use the newly configured credentials for any future cloud provider interactions.

Step 5: Update Cloud Provider IAM/RBAC

This is a crucial backend step. After confirming everything works with the new credentials, go to your cloud provider’s IAM (Identity and Access Management) console and revoke the access keys or disable the user/service principal associated with the old credentials.

Why this works: This is the final security step. By revoking access at the cloud provider level, you ensure that even if the old credentials were somehow compromised or misused, they would no longer have any power.

What’s Next?

The next potential hiccup you might encounter is if you have older, imported clusters or custom applications running within your Kubernetes clusters that directly embed cloud provider credentials for their own operations (e.g., applications managing external load balancers via cloud APIs). These would require separate, in-cluster credential rotation procedures.

Want structured learning?

Take the full Rancher course →