Pulsar’s topic partitioning is how you break a single logical topic into multiple physical partitions, letting you scale throughput and parallelism way beyond what a single broker can handle.

Let’s see this in action. Imagine we have a public/default/my-partitioned-topic.

# Create a partitioned topic with 16 partitions
bin/pulsar-admin topics create-partitioned-topic public/default/my-partitioned-topic --partitions 16

# Produce some messages
for i in {1..1000}; do
  echo "Message $i" | bin/pulsar-client produce public/default/my-partitioned-topic \
    --partition-key "key$((i % 10))" \
    -m
done

# Consume messages from a specific partition (e.g., partition 3)
bin/pulsar-client consume public/default/my-partitioned-topic \
  -s "SubscriptionName" \
  -p 3 \
  -n 100

Here’s the mental model:

When you create a partitioned topic, Pulsar doesn’t create a single, monolithic storage unit. Instead, it creates a topic lookup metadata entry for my-partitioned-topic. This entry points to a set of individual, non-partitioned topics, one for each partition you defined (e.g., my-partitioned-topic-partition-0, my-partitioned-topic-partition-1, …, my-partitioned-topic-partition-15).

Producers and consumers interact with the partitioned topic name. Pulsar’s brokers (specifically, the BookKeeper client library) handle routing messages to the correct physical partition. For producers, this routing is typically based on a partition key. If you provide a key, Pulsar uses a consistent hashing algorithm to determine which partition the message belongs to. If you don’t provide a key, Pulsar uses a round-robin approach across available partitions. Consumers subscribe to the partitioned topic, and Pulsar manages assigning them partitions. This allows multiple consumers, even within the same consumer group, to read from different physical partitions concurrently.

The key levers you control are:

  • Number of Partitions: This is the most critical setting. More partitions mean more potential for parallel processing and higher throughput. However, each partition adds overhead. Too many partitions can lead to increased latency and management complexity. The optimal number depends on your expected load, the number of consumers, and your broker/BookKeeper cluster capacity. You can scale this up or down after creation, but it’s a more involved operation.
  • Partition Key: Producers use this to ensure messages with the same key always land on the same partition. This is crucial for ordered processing within a key’s context. For example, all messages related to a specific user ID should go to the same partition.
  • Subscription Type: Different subscription types (Exclusive, Shared, Failover, Key_Shared) interact with partitioning differently. Shared and Key_Shared are where you see the most benefit for scaling consumers. Key_Shared is particularly interesting as it allows Pulsar to distribute partitions among consumers based on the keys they are processing, not just arbitrary partitions.

When a producer sends a message to a partitioned topic without a partition key, Pulsar’s client library will attempt to distribute messages across partitions in a round-robin fashion. However, this round-robin is typically managed by the client itself, not directly by the brokers for each individual message. The client library maintains a view of the topic’s partitions and cycles through them. The brokers then receive these messages and write them to the corresponding BookKeeper ledgers for that specific partition. The brokers’ primary role in this is to know which BookKeeper ensemble is responsible for a given partition and to coordinate the writes.

The choice of subscription type profoundly impacts how consumers scale with partitioned topics. While Shared subscriptions allow multiple consumers to receive messages from different partitions, the distribution is often based on which consumer connects first or how Pulsar’s internal load balancing works. Key_Shared subscriptions, introduced later, offer a more intelligent distribution mechanism. In Key_Shared, Pulsar ensures that all messages for a particular key (e.g., user_123) are delivered to the same consumer. This is achieved by having consumers signal which keys they are willing to process, and Pulsar then assigns partitions to consumers based on this key affinity, leading to more predictable and efficient processing for key-ordered workloads.

The next concept to explore is how Pulsar handles schema evolution with partitioned topics.

Want structured learning?

Take the full Pulsar course →