Pulsar brokers don’t actively balance topic ownership; they react to load by redistributing partitions based on configurable thresholds.

Let’s watch this happen. Imagine you have a Pulsar cluster with two brokers, broker-1 and broker-2. You have a topic my-topic with 10 partitions. Initially, the partitions are distributed somewhat evenly.

Broker      | Owned Partitions
------------|-----------------
broker-1    | 0, 1, 2, 3, 4
broker-2    | 5, 6, 7, 8, 9

Now, let’s say a producer starts sending a massive amount of data to partitions 0-4. broker-1 starts to get overloaded. Pulsar’s internal monitoring mechanisms detect this.

Here’s what’s actually happening behind the scenes:

  1. Load Monitoring: Each broker continuously reports its load metrics (CPU, memory, network, disk I/O, number of topics/partitions, producer/consumer count) to the Zookeeper ensemble.

  2. Load Report Aggregation: Other brokers (and the Zookeeper-based leader broker) observe these load reports.

  3. Load Thresholds: Pulsar has configurable load thresholds. The most critical ones for rebalancing are:

    • loadBalancer.loadAwarenessProviders (e.g., LoadAwarenessProvider)
    • loadBalancer.maxTopicsPerBroker
    • loadBalancer.maxPartitionsPerBroker
    • loadBalancer.cpuThreshold
    • loadBalancer.memoryThreshold
    • loadBalancer.offloadThreshold
    • loadBalancer.bandwidthInThreshold
    • loadBalancer.bandwidthOutThreshold
    • loadBalancer.maxUnackedMessagesPerBroker
    • loadBalancer.maxActiveRpcQueueTime

    When a broker’s reported metrics exceed these thresholds, it signals that it’s under too much pressure.

  4. Rebalance Trigger: If a broker is significantly above its allocated capacity (e.g., maxPartitionsPerBroker or bandwidthOutThreshold), it will start considering offloading some of its owned partitions. This decision isn’t instantaneous; it’s based on a rolling average of load.

  5. Partition Selection: The overloaded broker identifies partitions it owns that are "least valuable" to keep. "Least valuable" is determined by a weighted score that considers:

    • The number of producers/consumers on the partition.
    • The traffic (in/out bandwidth) for the partition.
    • The number of unacked messages for the partition.
    • The partition’s current load relative to the broker’s total load.
  6. Ownership Transfer Request: The overloaded broker (e.g., broker-1) proposes to transfer ownership of a selected partition (say, partition 0) to another broker (e.g., broker-2) that has available capacity and is considered a good candidate (i.e., not overloaded itself). This proposal is communicated via Zookeeper.

  7. Ownership Negotiation: broker-2 receives the proposal. If it agrees (it has capacity and is a suitable target), it acknowledges the transfer.

  8. Partition Unload and Load:

    • broker-1 gracefully stops serving traffic for partition 0, signals to producers/consumers to reconnect (they will then find broker-2 as the new owner).
    • broker-2 takes over ownership of partition 0, starts serving traffic, and updates its Zookeeper load report.
  9. Zookeeper State Update: The Zookeeper cluster is updated to reflect the new ownership mapping for partition 0.

After this rebalance, the state might look like this:

Broker      | Owned Partitions
------------|-----------------
broker-1    | 1, 2, 3, 4
broker-2    | 0, 5, 6, 7, 8, 9

This process repeats as needed, driven by the load reports and the defined thresholds. It’s a reactive rather than proactive balancing act.

The key levers you control are the load thresholds in your broker.conf (or via configuration management). For instance, to make brokers more sensitive to outgoing bandwidth, you might lower bandwidthOutThreshold.

# Example broker.conf snippet
loadBalancer.maxPartitionsPerBroker=100
loadBalancer.bandwidthOutThreshold=100MB/s
loadBalancer.loadAwarenessProviders=org.apache.pulsar.broker.load.LoadAwarenessProvider

The most surprising thing is how Pulsar prioritizes stability over perfect balance. It aims to prevent any single broker from becoming a bottleneck, even if it means a slightly uneven distribution. A broker will only offload partitions if it’s demonstrably over its configured limits, not just because another broker has less load.

The actual mechanism for determining which broker is a "good candidate" for receiving a partition is complex. It involves a scoring system based on the target broker’s current load relative to its capacity, its network latency to the topic’s metadata, and its proximity to other brokers already serving that topic’s partitions. It’s not just about finding the broker with the most free CPU.

The next thing you’ll likely encounter is understanding how topic-level load balancing (which partitions get assigned during initial topic creation) differs from the partition-level rebalancing we just discussed.

Want structured learning?

Take the full Pulsar course →