The Pulsar BookKeeper nodes (bookies) are failing because their local storage partitions are full, preventing them from writing new journal or ledger data.
The most common culprit is simply that the bookies aren’t cleaning up old data fast enough, especially if they’re under heavy write load or have a high max_in_flight_journal_ops setting. This leads to a backlog of journal segments that haven’t been compacted or deleted.
Diagnosis: Check disk usage on the affected bookie:
df -h /var/lib/bookkeeper
Look for partitions at or near 100% capacity.
Cause 1: Insufficient max_per_segment_size or segment_size
If the bookies are writing very large segments, they fill up the disk faster.
- Diagnosis: Examine
bookkeeper.confformax_per_segment_sizeandsegment_size. Default is32MBand32MBrespectively. Check Pulsar admin tools for actual segment sizes. - Fix: Increase
max_per_segment_sizeandsegment_sizeinbookkeeper.confon all bookies. A common starting point is64MBor128MB.# bookkeeper.conf max_per_segment_size=134217728 # 128MB segment_size=134217728 # 128MB - Why it works: Larger segments mean fewer individual files on disk for the same amount of data, and fewer file handles being managed. This can improve performance and reduce overhead, but also means each segment is larger.
Cause 2: High max_in_flight_journal_ops or max_in_flight_entries_per_op
These settings control how many writes can be pending. If they’re too high, a surge of writes can overwhelm the disk’s ability to flush data, leading to a journal backlog.
- Diagnosis: Check
bookkeeper.confformax_in_flight_journal_opsandmax_in_flight_entries_per_op. Defaultmax_in_flight_journal_opsis5000,max_in_flight_entries_per_opis1000. - Fix: Reduce these values in
bookkeeper.confon all bookies. Trymax_in_flight_journal_ops=2000andmax_in_flight_entries_per_op=500.# bookkeeper.conf max_in_flight_journal_ops=2000 max_in_flight_entries_per_op=500 - Why it works: This throttles incoming writes, giving the bookie more time to flush its journal and ledger data to disk before the journal buffer fills up.
Cause 3: Inadequate journal_flush_interval_ms
The journal flush interval dictates how often the bookie flushes its in-memory journal to stable storage. If this is too long, the journal can grow very large.
- Diagnosis: Check
bookkeeper.confforjournal_flush_interval_ms. Default is1000(1 second). - Fix: Decrease
journal_flush_interval_msinbookkeeper.confon all bookies. Try500(0.5 seconds).# bookkeeper.conf journal_flush_interval_ms=500 - Why it works: More frequent flushes ensure that data is written to stable storage more often, preventing the journal from accumulating excessive amounts of unflushed data.
Cause 4: Slow Disk I/O or High Latency The underlying storage might simply not be fast enough to keep up with the write load, especially during peak times.
- Diagnosis: Use
iostat -xm 5on the bookie to monitor disk I/O utilization (%util), read/write speeds, and average wait times (await). High%utilandawaitindicate a bottleneck. - Fix: Upgrade to faster storage (SSDs are highly recommended for bookies), or offload some topics/partitions to less busy bookies if using topic-based balancing. For immediate relief, consider reducing the write load on the cluster if possible.
- Why it works: Faster disks can process write requests more quickly, reducing the chance of I/O becoming a bottleneck and causing data to pile up.
Cause 5: Insufficient max_write_throughput_limit
This is a rate-limiting mechanism for writes per bookie. If it’s set too low, it can artificially slow down writes, leading to backlog issues if the disk could handle more. Conversely, if it’s too high and the disk can’t keep up, it can contribute to the problem.
- Diagnosis: Check
bookkeeper.confformax_write_throughput_limit. Default is-1(unlimited). - Fix: If this is set to a low value, increase it. If it’s unlimited and disk is slow, consider setting a sensible limit to prevent overwhelming the disk. A value like
100MB/sor200MB/smight be appropriate depending on your hardware.# bookkeeper.conf max_write_throughput_limit=209715200 # 200MB/s - Why it works: Properly tuning this limit ensures that write operations don’t exceed the disk’s sustained write capabilities, preventing an overload condition.
Cause 6: Ledger/Segment Compaction Lag
BookKeeper compacts ledgers to reduce the number of small files and reclaim space. If compaction is not keeping up, or if compaction_max_pending_requests is too low, old data segments can linger.
- Diagnosis: Monitor bookie logs for compaction-related warnings or errors. Check
bookkeeper.confforcompaction_max_pending_requests. Default is10. - Fix: Increase
compaction_max_pending_requestsinbookkeeper.conf. Try20or30. Ensure yoursegment_sizeandmax_per_segment_sizeare reasonable.# bookkeeper.conf compaction_max_pending_requests=30 - Why it works: Allowing more pending compaction requests means the system can process the backlog of segment merging and cleanup more aggressively.
Cause 7: Unbalanced Topic Distribution If a few topics are extremely hot and are exclusively written to a subset of bookies, those bookies can become overloaded even if the cluster as a whole has free space.
- Diagnosis: Use
pulsar-admin topics list --namespace <your_tenant>/<your_namespace>and thenpulsar-admin topics stats <topic_name>to identify high-volume topics. Check which bookies are serving these topics via the topic stats or by looking at the bookkeeper logs for ledger assignments. - Fix: Rebalance topics to distribute the load more evenly across bookies. This might involve using Pulsar’s topic-level placement policies or simply restarting bookies to trigger rebalancing (though this is disruptive).
- Why it works: Spreading the workload across more bookies prevents any single bookie from becoming a disk I/O bottleneck.
After resolving the disk space issue, you will likely encounter BK_NO_RESERVABLE_STORAGE errors if you don’t also free up space or increase max_disk_usage_threshold in bookkeeper.conf.