The Redpanda leader balancer isn’t just about making partitions look even; it’s about ensuring your brokers aren’t getting overloaded by actively managing partition leadership.
Let’s see it in action. Imagine you have a small Redpanda cluster, say 3 brokers (broker-1, broker-2, broker-3), and a topic my-topic with 6 partitions. Initially, things might look like this:
rpk topic list-partitions my-topic
BROKER | PARTITION | LEADER | REPLICAS
-------|-----------|--------|---------
broker-1 | 0 | 0 | 0,1,2
broker-1 | 1 | 0 | 0,1,2
broker-1 | 2 | 0 | 0,1,2
broker-2 | 3 | 1 | 1,2,0
broker-2 | 4 | 1 | 1,2,0
broker-2 | 5 | 1 | 1,2,0
broker-3 | - | - | -
Here, broker-1 is leading 3 partitions, and broker-2 is leading 3. broker-3 has no leaders. This is uneven. Producers sending data to my-topic will primarily hit broker-1 and broker-2, potentially saturating their network or CPU. The leader balancer’s job is to redistribute these leaders.
You can trigger the leader balancer manually:
rpk cluster topic move-leadership --topic my-topic --broker <broker-id> --to-broker <broker-id>
However, by default, Redpanda’s leader balancer runs automatically in the background. You can observe its activity by checking broker metrics or logs. The goal is to achieve a state where each broker leads roughly the same number of partitions, proportional to its capacity.
The leader balancer works by identifying partitions where the leader is not optimally placed. "Optimally placed" means the leader is on a broker that has available capacity and is not already a leader for a disproportionate number of partitions. It considers a few key factors:
- Partition Count: The most basic metric. How many partitions does a broker currently lead? The balancer aims to equalize this.
- Replicas: While the leader balancer primarily focuses on leader distribution, the placement of all replicas for a partition is crucial for fault tolerance. The balancer ensures that when it moves a leader, the new leader is still on a broker that can host a replica.
- Broker Capacity: This is where it gets interesting. Redpanda doesn’t just count partitions; it considers the load a broker is under. If
broker-1is leading 5 partitions but is already at 80% CPU, andbroker-2is leading 2 partitions at 20% CPU, the balancer might move a leader frombroker-1tobroker-2, even thoughbroker-2would then lead 3 partitions. This dynamic adjustment is key. - Under-replicated Partitions: If a partition is under-replicated (meaning not all of its replicas are available), the leader balancer will prioritize fixing that situation by moving the leader to a broker where a replica is available.
Let’s say after some activity, your distribution looks like this:
rpk topic list-partitions my-topic
BROKER | PARTITION | LEADER | REPLICAS
-------|-----------|--------|---------
broker-1 | 0 | 0 | 0,1,2
broker-1 | 1 | 0 | 0,1,2
broker-1 | 2 | 0 | 0,1,2
broker-1 | 3 | 0 | 0,1,2
broker-1 | 4 | 0 | 0,1,2
broker-2 | 5 | 1 | 1,2,0
broker-3 | - | - | -
Now, broker-1 is leading 5 partitions, broker-2 is leading 1, and broker-3 is idle. The leader balancer will kick in. It will identify partitions 0-4 as candidates for moving. It will select a partition (say, partition 2) and move its leadership to broker-3.
The actual command that gets executed internally might look something like this (you don’t typically run this manually):
# Internal command Redpanda might execute
rpk cluster topic move-leadership --topic my-topic --partition 2 --broker 0 --to-broker 2
The result after the balancer runs:
rpk topic list-partitions my-topic
BROKER | PARTITION | LEADER | REPLICAS
-------|-----------|--------|---------
broker-1 | 0 | 0 | 0,1,2
broker-1 | 1 | 0 | 0,1,2
broker-2 | 2 | 2 | 0,1,2 <-- Moved leadership
broker-1 | 3 | 0 | 0,1,2
broker-1 | 4 | 0 | 0,1,2
broker-2 | 5 | 1 | 1,2,0
Now broker-1 leads 4, broker-2 leads 2. Still not perfectly even, but better. The balancer will continue this process until it reaches a stable state, often where each broker leads within a certain tolerance of partitions.
The most surprising aspect is how Redpanda’s leader balancer uses under-replicated partition alerts as a high-priority trigger. If a partition is missing a replica, the balancer will immediately try to assign leadership to a broker that does have a replica for that partition, even if that broker is heavily loaded, to ensure data availability. This is a critical safety mechanism that often overrides simple partition count balancing.
Once your partition leadership is perfectly balanced, the next thing you’ll notice is how network traffic to specific brokers becomes much more uniform.