RabbitMQ’s memory high watermark is blocking publishers because the broker has run out of available memory to process incoming messages, forcing it to stop accepting new ones to prevent a full system crash.
Here are the common causes and how to fix them:
1. Unacknowledged Messages Accumulating in Queues:
- Diagnosis: Check the number of unacknowledged messages for your queues.
Look for queues with a highrabbitmqctl list_queues name messages_unacknowledged memorymessages_unacknowledgedcount. - Cause: Publishers are sending messages faster than consumers can process and acknowledge them. RabbitMQ holds these unacknowledged messages in memory.
- Fix:
- Increase consumer throughput: Scale up your consumer instances or optimize consumer processing logic.
- Implement message TTL (Time To Live): If messages can expire, set a TTL on them.
This automatically removes messages that aren’t processed within the TTL, freeing up memory.{ "message-ttl": 600000 // 10 minutes in milliseconds } - Implement queue length limits: Set a maximum queue length.
When the queue reaches this limit, older messages can be dropped or dead-lettered, preventing memory exhaustion.{ "max-length": 10000 }
- Why it works: Reducing the number of messages held in memory directly alleviates the pressure on the broker.
2. Large Message Payloads:
- Diagnosis: Examine message sizes in your queues. While
rabbitmqctldoesn’t directly show message payload size, you can infer this by checking your application logs for message processing times and error rates, or by instrumenting your producers to log message sizes. - Cause: Individual messages or a high volume of messages with very large payloads consume significant amounts of memory.
- Fix:
- Reduce message size: Compress message payloads before publishing, or redesign your messages to be smaller.
- Offload large data: Instead of embedding large data in messages, store it externally (e.g., S3, a database) and send a reference (URL or ID) in the message.
- Why it works: Smaller messages require less memory for buffering and processing.
3. High Number of Connections and Channels:
- Diagnosis: Check the number of active connections and channels.
Look for an unusually high number of connections or channels, especially from a single client.rabbitmqctl list_connections rabbitmqctl list_channels - Cause: Each connection and channel consumes a small but cumulative amount of memory. Applications that open and close connections/channels frequently, or maintain many idle connections, can contribute to memory pressure.
- Fix:
- Connection pooling: Implement connection pooling in your client applications. This reuses existing connections and channels, reducing overhead.
- Optimize connection lifecycle: Ensure connections and channels are properly closed when no longer needed.
- Why it works: Reducing the overhead of managing numerous network endpoints frees up memory.
4. Insufficient RabbitMQ Server Memory Allocation:
- Diagnosis: Check the current memory usage of the RabbitMQ process.
Compare this to the total available RAM on the server.# On Linux, find the process ID (PID) first ps aux | grep rabbitmq # Then check memory usage pmap -x <PID> | tail -n 1 - Cause: The RabbitMQ server, or the Erlang VM it runs on, has not been allocated enough memory by the operating system.
- Fix:
- Adjust Erlang VM memory limit: Edit the RabbitMQ environment configuration file (e.g.,
/etc/rabbitmq/rabbitmq-env.confon Debian/Ubuntu, or viarabbitmq-defaultson RHEL/CentOS). SetRABBITMQ_VM_MEMORY_HIGH_WATERMARKto a percentage of total system RAM (e.g.,80%) or an absolute value in bytes.
Or an absolute value like# Example: 80% of system RAM RABBITMQ_VM_MEMORY_HIGH_WATERMARK=0.84GB:RABBITMQ_VM_MEMORY_HIGH_WATERMARK=4000000000 - Restart RabbitMQ: Apply the changes by restarting the RabbitMQ service.
systemctl restart rabbitmq-server
- Adjust Erlang VM memory limit: Edit the RabbitMQ environment configuration file (e.g.,
- Why it works: This explicitly tells the Erlang VM how much memory it can use before triggering its internal garbage collection and high watermark behaviors.
5. Memory Leaks in Plugins or Custom Erlang Code:
- Diagnosis: Monitor memory usage over time. If memory usage steadily increases without a corresponding increase in message volume or connections, a leak is likely. Check plugin documentation for known memory issues.
- Cause: Bugs in RabbitMQ plugins or custom Erlang code deployed on the broker can lead to memory not being released.
- Fix:
- Disable suspect plugins: Temporarily disable any recently added or suspect plugins to see if memory usage normalizes.
- Update plugins/code: Ensure you are using the latest stable versions of all plugins.
- Review custom code: If you have custom Erlang code, analyze it for potential memory leaks.
- Why it works: Removing the source of the leak allows the Erlang VM to reclaim and reuse memory.
6. High Memory Usage by Other Processes on the Same Server:
- Diagnosis: Use system monitoring tools (e.g.,
top,htop,free -m) to check overall system memory usage and identify other memory-hungry processes. - Cause: Other applications or system services on the same server are consuming a significant portion of the available RAM, leaving insufficient memory for RabbitMQ.
- Fix:
- Move RabbitMQ to a dedicated server: Ideally, RabbitMQ should run on a server with dedicated resources.
- Reduce memory footprint of other processes: Optimize or relocate other memory-intensive applications.
- Why it works: Ensuring RabbitMQ has sufficient dedicated memory prevents it from competing with other processes and being starved.
After addressing these, you might encounter "Resource alarm: disk free" if your disk is also running low on space, as RabbitMQ will spill messages to disk when memory is critically low and the disk alarm is also active.