Pulsar retention and backlog limits are a bit like a leaky faucet, you can turn it off, but if you don’t understand why it’s dripping, you’ll be back to mopping the floor soon enough.
Let’s see what this actually looks like when Pulsar is chugging along. Imagine we have a topic persistent://public/default/my-topic with a few producers and consumers.
# Producer sending messages
pulsar-client produce my-topic --count 100000 --property key:value
# Consumer reading messages
pulsar-client consume my-topic -s my-subscription -n 0
Here, my-subscription is a consumer group. Pulsar keeps track of what each subscription has acknowledged. If a subscription falls behind, it starts to build up a backlog of unacknowledged messages. This backlog consumes disk space and can impact broker performance.
The core problem Pulsar solves here is reliable message delivery in a distributed system. It guarantees that messages are stored durably and delivered to consumers, even if consumers are temporarily unavailable. To do this, it needs to hold onto messages until they are acknowledged. The challenge is managing this storage and ensuring that old, unacknowledged messages don’t consume infinite resources.
Internally, Pulsar stores messages in segments on BookKeeper. Each topic partition is a logical stream of these segments. When a consumer acknowledges a message, Pulsar marks that message as acknowledged for that specific subscription. The broker then periodically checks if the oldest unacknowledged message for a subscription has been acknowledged by all active subscriptions. If it has, and retention policies allow, that message (and earlier ones) can be garbage collected.
The key levers you control are messageTTL and retentionSize (and their -in-seconds counterparts) at the namespace level, and messageRetentionPolicies and backlogQuota at the topic or namespace level.
messageTTL: This is the maximum time a message will be stored, regardless of acknowledgment status. After this time, it’s eligible for deletion.retentionSize: This is the maximum total size of messages for a topic partition that will be retained, regardless of acknowledgment status.backlogQuota: This is the most direct control for preventing unbounded backlog. It sets a limit on the total size of unacknowledged messages for a given topic or namespace. If this limit is reached, producers attempting to send messages to that topic will be blocked.
Let’s configure a backlog quota for our public/default namespace to limit unacknowledged messages to 1GB:
# Set backlog quota for the namespace
pulsar-admin namespaces set-backlog-quota public/default \
--backlog-quota '1100000000B' \
--policy destination_storage \
--limit-period 0
Here:
1100000000Bis approximately 1GB (using bytes for precision).destination_storagemeans the quota is based on the total storage used by the topic partition.limit-period 0means the quota is perpetual, not tied to a specific time window.
This configuration ensures that even if consumers are down for a long time, the backlog won’t grow indefinitely and consume all disk space. Producers will simply stop sending until consumers catch up.
The retentionSize and messageTTL settings are about message lifecycle management and data archival, whereas backlogQuota is primarily about resource management and producer throughput control when consumers lag. Many operators conflate these, leading to confusion when producers suddenly stop.
What most people miss is that the backlogQuota on a namespace applies per topic partition. So, if you have a namespace with 10 partitions and a 1GB backlogQuota, each partition can hold up to 1GB of unacknowledged messages. A single topic with 10 partitions could therefore hold up to 10GB of backlog before producers are blocked globally for that topic. This is often overlooked when troubleshooting producer throughput issues under high load.
The next thing you’ll likely encounter is configuring tiered storage to offload older, less frequently accessed data to cheaper storage.