The EntryLog format in Apache Pulsar’s BookKeeper storage is surprisingly inefficient at utilizing disk space for small messages, despite its sophisticated design.
Let’s see this in action. Imagine we have a Pulsar topic with a single partition, my_topic~0. We’ll write a few small messages to it.
# Assuming you have jq installed for pretty-printing JSON
# And your Pulsar admin client is configured to talk to your cluster
PULSAR_ADMIN_CMD="pulsar-admin"
# Write a small message
echo "hello" | $PULSAR_ADMIN_CMD persistent://public/default/my_topic produce -c 1 -f -
# Write another small message
echo "world" | $PULSAR_ADMIN_CMD persistent://public/default/my_topic produce -c 1 -f -
# Check the topic stats to see message counts and sizes
$PULSAR_ADMIN_CMD persistent://public/default/my_topic stats --json | jq '.partitions[0]'
The output will show a message count, but the storageSize reported there is often misleadingly small. The real story is on the BookKeeper side. BookKeeper stores data in Ledgers, which are composed of EntryLog files. Each EntryLog file is a sequence of entries, and the EntryLog format dictates how these entries are laid out.
Here’s the mental model: Pulsar uses BookKeeper for durable message storage. BookKeeper writes messages to ledgers. A ledger is a sequence of entries. When BookKeeper writes entries for a ledger, it groups them into entry log files. These files are typically fixed-size (e.g., 128MB). The EntryLog format is responsible for how entries are serialized and appended to these files. It includes a header for each entry, the entry’s data, and a CRC checksum.
The problem arises because even for a tiny message (a few bytes), BookKeeper still writes an EntryLog header, the message payload, and a CRC. This overhead, while small per message, becomes significant when you have millions of small messages. Furthermore, EntryLog files are written sequentially. When a ledger is closed, its entries are finalized. If a ledger isn’t full, the remaining space in its last EntryLog file is essentially wasted until that file is garbage collected.
The core of the EntryLog format involves a simple sequential append. Each entry has a fixed-size header (around 30 bytes), followed by a variable-sized payload, and then a CRC. The EntryLog file itself has a header as well. BookKeeper aims to batch writes to disk, but the fundamental unit of storage within an EntryLog is this entry structure.
The one thing most people don’t know is how the EntryLog format handles gaps or padding within an EntryLog file when a ledger is closed. If the last entry written to a ledger doesn’t fill up the current EntryLog file, BookKeeper will write a special "padding" entry to fill the remaining space in that file. This padding entry is effectively a no-op but occupies disk space, contributing to the overall storage footprint. This mechanism ensures that EntryLog files are always fully written up to a certain point, simplifying recovery and read operations by making file boundaries predictable.
Understanding this leads to the next challenge: how to optimize storage for small messages, potentially through compression or batching strategies at a higher level in Pulsar itself.