RabbitMQ’s credential validation failed because the Erlang distribution port (epmd) on the RabbitMQ node couldn’t reach the Erlang distribution port on another node it was trying to connect to, usually for clustering or management operations. This is critical because Erlang nodes must be able to talk to each other using their distribution protocol to form clusters or for management tools to interact with them.
Here are the common culprits and how to fix them:
-
Firewall Blocking EPMD Port (4369):
- Diagnosis: On the node receiving the connection attempt, check if port 4369 is open.
sudo ufw status verbose # or sudo iptables -L -n | grep 4369 - Fix: Open port 4369 on the firewall of the node that RabbitMQ is trying to connect to.
sudo ufw allow 4369/tcp # or sudo iptables -A INPUT -p tcp --dport 4369 -j ACCEPT - Why it works: EPMD (Erlang Port Mapper Daemon) listens on TCP port 4369 to register and resolve Erlang nodes. If this port is blocked, nodes cannot discover each other to establish the Erlang distribution connection.
- Diagnosis: On the node receiving the connection attempt, check if port 4369 is open.
-
Incorrect
NODENAMEinrabbitmq-env.conf:- Diagnosis: Check the
NODENAMEsetting in/etc/rabbitmq/rabbitmq-env.confon all nodes in the cluster. Ensure it’s unique and resolvable.
Then, from the node that’s failing, try to ping thecat /etc/rabbitmq/rabbitmq-env.conf # Example output: NODENAME=rabbit@my-server.domain.comNODENAMEof the other node usingping. Ifpingfails or resolves to the wrong IP, that’s the issue. - Fix: Set
NODENAMEto a fully qualified domain name (FQDN) or an IP address that is resolvable and unique across all nodes.
Restart RabbitMQ:# In /etc/rabbitmq/rabbitmq-env.conf NODENAME=rabbit@<unique_hostname_or_ip>sudo systemctl restart rabbitmq-server. - Why it works: The
NODENAMEis how Erlang nodes identify themselves. If it’s not resolvable via DNS or/etc/hoststo the correct IP address, a node cannot find or connect to another node’s EPMD.
- Diagnosis: Check the
-
Network Unreachability / DNS Resolution Issues:
- Diagnosis: From the RabbitMQ node that is failing to connect, try to
pingthe hostname or IP address specified in theNODENAMEof the target node.
Also, check DNS resolution directly:ping rabbit@other-node.domain.com # or ping 192.168.1.10dig rabbit@other-node.domain.com +short # or nslookup 192.168.1.10 - Fix: Ensure that all RabbitMQ nodes can resolve each other’s hostnames or IP addresses. This might involve configuring
/etc/hostsfiles on all nodes or fixing DNS records.
Restart RabbitMQ after changes.# Example entry in /etc/hosts on node A 192.168.1.11 rabbit@nodeB.domain.com nodeB - Why it works: Erlang distribution relies on the underlying network and DNS to locate nodes. If a node cannot be reached or its name doesn’t resolve to the correct IP, the distribution handshake will fail.
- Diagnosis: From the RabbitMQ node that is failing to connect, try to
-
Erlang Cookie Mismatch:
- Diagnosis: The Erlang cookie is a shared secret that Erlang nodes use for authentication. Check the cookie file (
/var/lib/rabbitmq/.erlang.cookieby default) on all nodes.
The contents must be identical on all nodes.sudo cat /var/lib/rabbitmq/.erlang.cookie - Fix: Ensure the
.erlang.cookiefile has the same content on all nodes. If they differ, copy the content from one node to all others. Make sure the file permissions are0600and owned by therabbitmquser.
Restart RabbitMQ on all nodes.# On each node, after copying the correct cookie: sudo chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie sudo chmod 0600 /var/lib/rabbitmq/.erlang.cookie - Why it works: The Erlang cookie acts as a shared password for the Erlang distribution protocol. Nodes with different cookies are considered untrusted and cannot communicate.
- Diagnosis: The Erlang cookie is a shared secret that Erlang nodes use for authentication. Check the cookie file (
-
RabbitMQ Not Running or Failed to Start:
- Diagnosis: Check the status of the RabbitMQ service on both nodes.
Look for any error messages in the journal:sudo systemctl status rabbitmq-serversudo journalctl -u rabbitmq-server -n 100 --no-pager - Fix: If RabbitMQ is not running, start it. If it failed to start, investigate the journal logs for specific errors (e.g., disk space, permissions, configuration syntax).
sudo systemctl start rabbitmq-server - Why it works: The Erlang distribution protocol requires the RabbitMQ server process (which includes the Erlang VM and EPMD) to be running on each node. If it’s not, no connections can be established.
- Diagnosis: Check the status of the RabbitMQ service on both nodes.
-
EPMD Not Listening on the Correct Interface:
- Diagnosis: On the node that is supposed to be receiving connections, check which network interfaces EPMD is listening on.
Look forsudo netstat -tulnp | grep 4369 # or sudo ss -tulnp | grep 43690.0.0.0:4369(listening on all interfaces) or a specific IP. If it’s127.0.0.1:4369, it’s only listening locally. - Fix: By default, EPMD listens on all interfaces. If it’s been configured to listen only on
127.0.0.1(e.g., viaERL_DIST_PORTenvironment variable or specific Erlang configuration), you need to adjust it to listen on the network interface that other nodes will connect to. This is often controlled by theNODENAMEitself if it’s an IP address. For clustering,NODENAMEshould typically resolve to an IP accessible by other nodes. - Why it works: EPMD needs to be accessible on the network interface that the connecting node is trying to reach. If it’s bound only to localhost, remote nodes cannot establish a connection.
- Diagnosis: On the node that is supposed to be receiving connections, check which network interfaces EPMD is listening on.
After resolving these, you’ll likely hit the next common issue: "Node down" errors in the management UI or logs because the nodes are still trying to establish the full RabbitMQ cluster membership, which requires ports 5672 (AMQP) and potentially 15672 (Management UI) to also be open and accessible between nodes.