RabbitMQ throughput isn’t just about how fast messages zip through; it’s about how many meaningful units of work your system can complete per second, and that often means fighting against its own inherent safety features.
Let’s see it in action. Imagine a basic producer and consumer.
# producer.py
import pika
import time
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='throughput_test')
start_time = time.time()
for i in range(100000):
channel.basic_publish(exchange='', routing_key='throughput_test', body=f'message_{i}')
end_time = time.time()
print(f"Sent 100,000 messages in {end_time - start_time:.2f} seconds.")
connection.close()
# consumer.py
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='throughput_test')
def callback(ch, method, properties, body):
# print(f" [x] Received {body}")
pass
channel.basic_consume(queue='throughput_test', on_message_callback=callback, auto_ack=True)
print(' [*] Waiting for messages. To exit press CTRL+C')
channel.start_consuming()
Running this locally, without any tuning, you might see something like 15,000-20,000 messages per second. That’s a baseline. Now, how do we push that?
The core problem RabbitMQ solves is reliable message delivery. This reliability comes with overhead. When you increase throughput, you’re essentially asking RabbitMQ to take shortcuts, or rather, to be more aggressive about how it processes and stores data, without sacrificing too much fundamental safety.
1. Publisher Confirms:
The most significant bottleneck for published messages is often waiting for confirmation that a message has reached the broker’s queue safely. By default, basic_publish is fire-and-forget. Enabling publisher confirms makes the producer wait.
Diagnosis: Observe producer side. If producer is slow to send messages, it’s likely waiting for ACKs.
Check: Run producer with pika and channel.confirm_delivery().
Fix:
In producer:
channel.confirm_delivery()
# ... inside the loop ...
channel.basic_publish(exchange='', routing_key='throughput_test', body=f'message_{i}')
Why it works: The producer explicitly waits for the broker to acknowledge receipt of each message, preventing data loss but introducing latency if not handled in batches.
2. Batching Publisher Confirms:
Waiting for every single message is inefficient. Batching allows the producer to send multiple messages and wait for a single confirmation for the whole batch.
Diagnosis: Producer is slow, but not extremely slow. Latency per message is noticeable. Check: Implement batching logic in the producer. Fix:
import pika
import time
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='throughput_test')
channel.confirm_delivery() # Enable confirms
batch_size = 1000
messages_sent = 0
start_time = time.time()
for i in range(100000):
channel.basic_publish(exchange='', routing_key='throughput_test', body=f'message_{i}')
messages_sent += 1
if messages_sent % batch_size == 0:
channel.wait_for_confirms_or_raise()
# print(f"Sent batch of {batch_size}")
# Send any remaining messages
if messages_sent % batch_size != 0:
channel.wait_for_confirms_or_raise()
end_time = time.time()
print(f"Sent 100,000 messages in {end_time - start_time:.2f} seconds with batched confirms.")
connection.close()
Why it works: Reduces the round trip for confirmations from N to N/batch_size, amortizing the network and processing overhead.
3. Consumer Acknowledgements (auto_ack=False):
If consumers are slow, they can cause message accumulation on the broker. auto_ack=True means messages are removed from the queue as soon as they are delivered to the consumer. Setting auto_ack=False requires explicit acknowledgements.
Diagnosis: Broker memory usage increases significantly, or queue depth grows unbounded. Producers might slow down if the broker starts dropping messages due to memory pressure.
Check: Ensure auto_ack=False in consumer.
Fix:
In consumer:
def callback(ch, method, properties, body):
# Process the message...
ch.basic_ack(delivery_tag=method.delivery_tag) # Explicit ACK
Why it works: The broker only removes messages from the queue after the consumer has successfully processed them and signaled completion, preventing message loss but requiring careful consumer implementation to avoid backlogs.
4. Consumer Prefetch Count:
Even with auto_ack=False, a consumer might be overwhelmed if it tries to process too many messages at once. The prefetch_count (or qos.prefetch_count) limits the number of unacknowledged messages a consumer can hold.
Diagnosis: Consumers are acknowledged, but processing is slow, and the broker still holds many messages for those consumers.
Check: Set channel.basic_qos(prefetch_count=X) in the consumer.
Fix:
In consumer:
channel.basic_qos(prefetch_count=50) # Adjust based on consumer processing speed
channel.basic_consume(queue='throughput_test', on_message_callback=callback, auto_ack=False)
Why it works: Prevents a single slow consumer from hogging all available messages, distributing the load more evenly and allowing the broker to manage flow control more effectively. A value of 1 is safest but lowest throughput; a higher value can boost throughput if consumers are fast.
5. Broker Configuration - Memory and Disk Alarms:
RabbitMQ has built-in mechanisms to prevent it from consuming all available system resources. These alarms can throttle producers and consumers.
Diagnosis: Producers and consumers suddenly slow down, and you see warnings in the RabbitMQ logs about memory or disk alarms.
Check: rabbitmqctl environment or the management UI for memory/disk usage.
Fix:
Increase vm_memory_high_watermark and disk_free_limit in rabbitmq.conf or rabbitmq-env.conf. For example, to set memory limit to 70% of total RAM:
# rabbitmq.conf
vm_memory_high_watermark.relative = 0.7
Or for a specific disk limit:
# rabbitmq.conf
disk_free_limit.absolute = 10GB
Why it works: These settings define thresholds that, when crossed, trigger flow control and potentially stop the broker from accepting new messages. Increasing them allows the broker to use more resources before throttling, but be careful not to starve other processes or run out of disk space.
6. Message Durability and Persistence:
If messages are marked as delivery_mode=2 (persistent) and queues are also declared durable=True, RabbitMQ will write messages to disk before acknowledging them to the publisher. This is crucial for durability but significantly impacts throughput.
Diagnosis: Throughput is consistently low, especially for disk-heavy operations, and messages are marked persistent.
Check: Examine properties.delivery_mode on the producer and durable flag on queue declaration.
Fix:
Producer: channel.basic_publish(..., properties=pika.BasicProperties(delivery_mode=1)) (non-persistent)
Queue declaration: channel.queue_declare(queue='my_queue', durable=False)
Why it works: By not writing messages to disk for every publish and not guaranteeing they survive broker restarts, you trade durability for speed. If you need both, consider RabbitMQ’s quorum queues or Raft-based replication, which offer better performance than classic mirrored queues with persistence.
7. Network Latency and Throughput:
Even with optimal broker and client settings, network conditions can be a bottleneck. High latency or limited bandwidth between producers, consumers, and the broker will cap throughput.
Diagnosis: Throughput is lower than expected, and ping times between nodes are high.
Check: Use ping and iperf3 between your producer/consumer machines and the RabbitMQ server.
Fix: Optimize network infrastructure, co-locate services, or use compression if applicable (though compression adds CPU overhead).
Why it works: Reduces the time it takes for messages and acknowledgements to travel across the network, directly impacting how quickly operations can complete.
After addressing these, you’ll likely encounter the next common problem: the connection reset by peer error if your consumers are still not keeping up, or if you’ve pushed the broker too hard and it’s starting to drop connections due to resource exhaustion.