RabbitMQ’s disk_free_limit alarm is preventing new messages from being published because the broker’s underlying storage is almost full.
Common Causes and Fixes
-
Unacknowledged/Uncommitted Messages:
- Diagnosis: Check the number of unacknowledged messages per queue. If these are significant, messages are being held in memory or disk by RabbitMQ waiting for client acknowledgments.
rabbitmqctl list_queues name messages_unacknowledged messages_ready messages - Fix: Ensure your consumers are acknowledging messages promptly. If you’re using publisher confirms or transactions, ensure those are being processed. For a quick reset (use with extreme caution, as it drops messages), you can restart the RabbitMQ node.
rabbitmqctl stop_app rabbitmqctl reset rabbitmqctl start_app - Why it works: Unacknowledged messages consume resources. Releasing them (by acknowledging them or resetting) frees up space.
- Diagnosis: Check the number of unacknowledged messages per queue. If these are significant, messages are being held in memory or disk by RabbitMQ waiting for client acknowledgments.
-
Large Message Payloads:
- Diagnosis: Examine the size of messages in your queues. Large payloads can quickly fill disk space, especially if many messages are queued.
(Note:rabbitmqctl list_queues name messages_ready memorymemoryhere gives a rough idea; actual disk usage per message is complex. You’d typically look at queue depth and message rate to infer this.) - Fix: Optimize message payloads by reducing their size or serializing them more efficiently. Consider offloading large data to external storage and only sending references/URLs in RabbitMQ messages.
- Why it works: Smaller messages consume less disk space.
- Diagnosis: Examine the size of messages in your queues. Large payloads can quickly fill disk space, especially if many messages are queued.
-
Persistent Messages Not Being Purged:
- Diagnosis: Persistent messages are written to disk. If consumers die or are slow, these can accumulate indefinitely.
Then, investigate queues with highrabbitmqctl list_queues name messages_ready messages_unacknowledged --vhost=<your_vhost>messages_readyand lowmessages_unacknowledged. - Fix: Implement Dead Letter Exchanges (DLX) and TTLs (Time-To-Live) to automatically remove or redirect messages that are not processed within a certain timeframe or after failing processing.
# Example policy for TTL and DLX rabbitmqctl set_policy dlx_ttl "^my_queue_prefix" \ '{"message-ttl": 60000, "dead-letter-exchange": "my_dlx"}' \ --apply-to queues - Why it works: TTLs expire messages, and DLXs provide a place to send them, eventually leading to their removal.
- Diagnosis: Persistent messages are written to disk. If consumers die or are slow, these can accumulate indefinitely.
-
Unused Queues and Exchanges:
- Diagnosis: Over time, applications might stop using certain queues or exchanges, leaving them to accumulate messages or just exist as metadata on disk.
Look for queues/exchanges that haven’t had activity for a long time.rabbitmqctl list_queues --vhost=<your_vhost> rabbitmqctl list_exchanges --vhost=<your_vhost> - Fix: Identify and delete unused queues and exchanges.
rabbitmqctl delete_queue <queue_name> --vhost=<your_vhost> rabbitmqctl delete_exchange <exchange_name> --vhost=<your_vhost> - Why it works: Removing unused objects directly reduces the amount of data RabbitMQ needs to manage.
- Diagnosis: Over time, applications might stop using certain queues or exchanges, leaving them to accumulate messages or just exist as metadata on disk.
-
Internal RabbitMQ Logs/Database Files:
- Diagnosis: RabbitMQ itself generates logs and internal database files (Mnesia) that can grow large, especially under heavy load or during restarts. Check the RabbitMQ data directory.
# Find data directory (often /var/lib/rabbitmq/mnesia/rabbit@<hostname>/) rabbitmqctl environment | grep mnesia_dir # Check disk usage in that directory du -sh /var/lib/rabbitmq/mnesia/rabbit@<hostname>/ - Fix: If these files are excessively large and you’re confident the broker is healthy, you might need to clear old log files or, in extreme cases, reconfigure
mnesia(this is a more advanced operation and may require cluster coordination). A simple restart can sometimes clear transient log files.# Restart RabbitMQ to clear some transient logs/files systemctl restart rabbitmq-server - Why it works: Certain internal files are transient or can be safely pruned after a clean shutdown/restart.
- Diagnosis: RabbitMQ itself generates logs and internal database files (Mnesia) that can grow large, especially under heavy load or during restarts. Check the RabbitMQ data directory.
-
Underlying Disk Full:
- Diagnosis: This is the most direct cause. The operating system reports the disk is full, triggering the alarm.
Check the filesystem wheredf -h/var/lib/rabbitmq(or your configured data directory) resides. - Fix: Free up space on the underlying filesystem. This could involve deleting old system logs, application logs, temporary files, or increasing the disk size.
# Example: remove old apt cache sudo apt autoremove sudo apt clean # Example: find and remove large old files find /var/log -type f -atime +30 -delete - Why it works: Provides the necessary physical space for RabbitMQ and the OS to operate.
- Diagnosis: This is the most direct cause. The operating system reports the disk is full, triggering the alarm.
After resolving the disk space issue, RabbitMQ will automatically clear the disk_free_limit alarm. The next error you might encounter if messages were blocked for too long is publisher_timeout or connection errors due to clients giving up.