Redpanda topic partitions are not just about how many pieces a topic is split into, but how those pieces are distributed and kept safe across your cluster.
Let’s see this in action. Imagine you have a Redpanda cluster with three brokers: broker-1, broker-2, and broker-3. You want to create a topic named sensor-data that can handle a good amount of traffic and has built-in redundancy.
First, you’d create the topic using rpk, Redpanda’s command-line tool:
rpk topic create sensor-data \
--partitions 10 \
--replication-factor 3
Here’s what’s happening under the hood:
- Topic:
sensor-datais the name of the stream of messages. - Partitions:
--partitions 10means Redpanda will split this topic into 10 distinct, ordered sequences of messages. Each partition is an independent unit of parallelism. Producers can write to different partitions simultaneously, and consumers can read from different partitions in parallel. - Replication Factor:
--replication-factor 3means that for each of those 10 partitions, Redpanda will maintain 3 copies (replicas) of the data. These replicas will be spread across different brokers in your cluster.
When you create sensor-data with 10 partitions and a replication factor of 3, Redpanda assigns leadership for each partition. For example, partition 0 might have its leader on broker-1, with followers on broker-2 and broker-3. Partition 1 might have its leader on broker-2, with followers on broker-1 and broker-3, and so on. This distribution ensures that no single broker is solely responsible for all partitions.
The leader replica is the one that handles all read and write requests for its partition. The follower replicas passively replicate the data from the leader. If a leader fails, one of its followers is automatically promoted to become the new leader, ensuring data availability.
The core problem Redpanda’s partitioning and replication solve is the trade-off between throughput and fault tolerance. A single partition can only be written to and read from so fast. By increasing the number of partitions, you increase the potential for parallel processing. However, more partitions also mean more metadata to manage. Similarly, a higher replication factor increases fault tolerance – you can lose more brokers without losing data – but it also increases storage requirements and network traffic for replication.
The key to effective configuration lies in understanding your workload. For high-throughput scenarios, you’ll want more partitions. For critical data where downtime is unacceptable, a higher replication factor is essential. Redpanda’s default replication factor is 3, which is generally a good starting point for production environments.
Choosing the right number of partitions is a crucial, and often debated, aspect of Kafka and Redpanda topology. While you can increase partitions later, it’s a "sticky" operation: you can add more partitions, but you cannot reduce them without creating a new topic and re-ingesting data. The number of partitions dictates the maximum parallelism for consumers within a single consumer group. If you have 10 partitions, a consumer group can have at most 10 consumers actively processing messages in parallel. Beyond that, additional consumers will be idle.
A common misconception is that more partitions always equal better performance. While more partitions enable higher throughput, they also introduce overhead. Each partition requires an open file handle on the broker, and the Kafka protocol itself has per-partition overhead. If you have a topic with 100,000 partitions, your brokers will struggle to manage the sheer number of open files and the associated metadata. A more nuanced approach is to consider the throughput requirements per partition. If your producers can generate 1MB/s and your consumers can process 1MB/s, and you need 100MB/s total, you’d aim for roughly 100 partitions.
The exact distribution of partition leaders across brokers is managed by Redpanda to balance the load. You can influence this to some extent with rack awareness or specific broker configurations, but Redpanda’s internal balancing is generally quite effective.
The next logical step in optimizing your Redpanda topics is understanding consumer group rebalancing.