The Pulsar Managed Ledger Cache (MLC) evicts entries based on a Least Recently Used (LRU) policy by default, but this can lead to suboptimal performance if your access patterns aren’t strictly LRU.
Let’s see it in action. Imagine a Pulsar cluster with a managed ledger cache configured. We have a topic, persistent://public/default/my-topic, and we’re writing and reading data.
Here’s a simplified view of how entries move through the cache:
- Write: A new entry is written to a ledger. It’s initially placed in the MLC.
- Read (Hit): If the entry is already in the MLC, it’s considered a cache hit. The entry’s access timestamp is updated, making it less likely to be evicted soon under LRU.
- Read (Miss): If the entry is not in the MLC, it’s fetched from disk (or another storage tier) and then added to the MLC. This might cause another entry to be evicted.
- Eviction: When the MLC reaches its capacity, an entry is removed. With LRU, this is the entry that hasn’t been accessed for the longest time.
The core problem Pulsar’s MLC solves is reducing the latency of reading data. Instead of always going to slower persistent storage (like BookKeeper’s journal or ledger storage), frequently accessed data can be served directly from memory. This dramatically speeds up read operations.
Internally, the MLC uses a combination of a hash map for quick lookups (mapping ledger ID and entry ID to cache entries) and a doubly linked list to maintain the LRU order. When an entry is accessed, its corresponding node in the linked list is moved to the head. When eviction is needed, the entry at the tail of the list is removed.
The key levers you control are:
managedLedgerCacheSizeMB: The total memory allocated to the MLC in megabytes. This is the most direct way to control the cache’s capacity.managedLedgerCacheEvictionPolicy: This is where the magic happens. WhileLRUis the default, Pulsar offers other policies.
Let’s dive into tuning the eviction policy. The default LRU policy is simple and effective for many workloads, but it assumes that recency of access is the best predictor of future access. This isn’t always true. For instance, if you have a large dataset and you’re performing scans or periodic reprocessing, an entry accessed recently might not be accessed again for a long time, yet it might be crucial for an ongoing scan.
Pulsar offers an alternative: TIME_BASED. With TIME_BASED eviction, entries are evicted based on their age, not just their last access time. This can be beneficial for workloads that involve:
- Time-series data: Where older data is less likely to be re-read than recent data.
- Batch processing or scans: Where you might iterate over a large range of entries, making older entries in that range "recently accessed" but still potentially candidates for eviction if they are no longer needed for the current scan window.
To configure TIME_BASED eviction, you’d modify the Pulsar broker configuration file (broker.conf).
# Example broker.conf settings
managedLedgerCacheSizeMB=2048
managedLedgerCacheEvictionPolicy=TIME_BASED
managedLedgerCacheMaxEntryAgeSeconds=3600 # Evict entries older than 1 hour
In this example:
managedLedgerCacheSizeMB=2048allocates 2GB of memory for the cache.managedLedgerCacheEvictionPolicy=TIME_BASEDswitches the eviction strategy.managedLedgerCacheMaxEntryAgeSeconds=3600sets a threshold: any entry that has not been modified (written to) in the last hour will be considered for eviction, regardless of read activity, if the cache is full. This is a crucial distinction:TIME_BASEDeviction typically considers the age of the data itself, not just its access.
The TIME_BASED policy doesn’t just blindly discard old entries. It still respects the cache size. When the cache is full and a new entry needs to be added, Pulsar will scan through the cache for entries older than managedLedgerCacheMaxEntryAgeSeconds. If it finds such entries, it evicts them. If it doesn’t find any entries older than the threshold (meaning all entries in the cache are relatively recent), it will then fall back to an LRU-like eviction among the remaining entries to make space. This prevents the cache from becoming completely stale while still prioritizing older data for removal when possible.
The actual implementation of TIME_BASED eviction involves maintaining not just the recency of access, but also the timestamp of when an entry was last written or modified. When eviction is triggered, the system looks for entries whose "last modified" timestamp is older than the configured managedLedgerCacheMaxEntryAgeSeconds. If such entries exist, they are prime candidates for removal, even if they were recently read. This can be particularly effective for workloads where data access patterns shift over time, and older data becomes progressively less relevant.
The next challenge you’ll likely encounter is managing the trade-off between cache size and eviction policy for specific read patterns.