The Erlang distribution protocol, which RabbitMQ uses for inter-node communication, failed because nodes couldn’t authenticate each other due to mismatched secret keys.

The root cause is that all RabbitMQ nodes in a cluster must share the exact same Erlang cookie. This cookie is a shared secret that the Erlang runtime uses to verify that nodes are allowed to communicate with each other. If the cookies don’t match, nodes will reject connection attempts from each other, leading to cluster instability or complete failure to form a cluster.

Here are the common reasons for mismatched cookies and how to fix them:

  1. New Node Added with Default Cookie: When you install RabbitMQ on a new server, it often starts with a default Erlang cookie (.erlang.cookie file in the user’s home directory). If this default cookie is different from the one on existing cluster nodes, the new node won’t be able to join.

    • Diagnosis: On each node, check the cookie file: sudo cat /var/lib/rabbitmq/.erlang.cookie. Compare the output between all nodes.
    • Fix: Ensure all nodes have the identical cookie. The easiest way is to copy the cookie file from an existing, healthy node to the new node. For example, on the new node: sudo cp /path/to/source/.erlang.cookie /var/lib/rabbitmq/.erlang.cookie. Then, set the correct permissions: sudo chmod 600 /var/lib/rabbitmq/.erlang.cookie and sudo chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie. Restart the RabbitMQ service on the new node: sudo systemctl restart rabbitmq-server.
    • Why it works: This ensures the Erlang VM on the new node uses the same secret key as the existing nodes, allowing them to authenticate.
  2. Manual Cookie Modification on One Node: An administrator might have manually edited the .erlang.cookie file on one node without synchronizing it to others.

    • Diagnosis: Same as above: sudo cat /var/lib/rabbitmq/.erlang.cookie on all nodes.
    • Fix: Identify the correct cookie value (usually from a node that is part of the cluster or the intended shared secret). On all other nodes, overwrite their .erlang.cookie file with the correct content. Then restart RabbitMQ on those nodes: sudo systemctl restart rabbitmq-server.
    • Why it works: Restoring consistency across all nodes allows the Erlang distribution to function as intended.
  3. Automated Deployment/Provisioning Errors: In automated environments (like Ansible, Chef, Terraform), the cookie might not be correctly distributed or might be generated independently on each node.

    • Diagnosis: Inspect the deployment scripts or configuration management for how the .erlang.cookie file is managed. Use the cat command on nodes to verify.
    • Fix: Correct the deployment playbook/script to ensure the same cookie content is written to /var/lib/rabbitmq/.erlang.cookie on all provisioned RabbitMQ nodes. Restart RabbitMQ services after the deployment.
    • Why it works: Automating the correct distribution of the secret key eliminates manual errors and ensures uniformity.
  4. File Permissions Incorrect: If the .erlang.cookie file exists but has incorrect permissions, the rabbitmq user might not be able to read it, or other users might be able to read it, which is a security risk and can also cause communication issues.

    • Diagnosis: Run sudo ls -l /var/lib/rabbitmq/.erlang.cookie. The owner should be rabbitmq and the permissions rw------- (600).
    • Fix: Correct permissions and ownership: sudo chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie and sudo chmod 600 /var/lib/rabbitmq/.erlang.cookie. Restart RabbitMQ: sudo systemctl restart rabbitmq-server.
    • Why it works: Ensures the RabbitMQ process, running as the rabbitmq user, has exclusive read access to the secret.
  5. Erlang Runtime Not Restarted After Cookie Change: Sometimes, the cookie file is updated, but the RabbitMQ process (which uses the Erlang VM) isn’t restarted, so it continues to use the old cookie value it loaded at startup.

    • Diagnosis: Verify the cookie file content (cat) and then check if the RabbitMQ service is running and if its associated processes have started after the cookie file was changed.
    • Fix: Always restart the RabbitMQ service after modifying the .erlang.cookie file: sudo systemctl restart rabbitmq-server.
    • Why it works: A service restart forces the Erlang VM to reload its configuration, including the Erlang cookie.
  6. Multiple RabbitMQ Instances on the Same Host (Uncommon but Possible): If you’re running multiple, separate RabbitMQ instances on a single machine (e.g., for testing, or using different users), each instance needs its own cookie if they are not intended to be clustered. If they are intended to be clustered, they need the same cookie. Misconfiguration here can lead to confusion.

    • Diagnosis: Check the RABBITMQ_BASE or RABBITMQ_CONFIG_FILE environment variables for each instance to find their respective cookie locations.
    • Fix: Ensure each instance’s cookie file is either unique (if not clustered) or identical (if clustered). Restart the specific instance’s RabbitMQ service.
    • Why it works: Isolates or connects instances based on their intended configuration by managing their respective distribution secrets.

After ensuring all nodes have the identical, correct Erlang cookie and restarting the RabbitMQ service on all of them, the nodes should be able to authenticate and form or rejoin the cluster. If you were seeing errors like {not_authorized, 'rabbit@othernode'} or nodes failing to list each other in rabbitmqctl cluster_status, these should resolve.

The next error you’ll likely encounter if you haven’t configured network access or firewall rules correctly is connection refused or timeout errors when nodes attempt to communicate over the Erlang distribution port (typically 25672).

Want structured learning?

Take the full Rabbitmq course →