Pulsar’s performance tuning is less about tweaking knobs and more about understanding how your application’s access patterns interact with Pulsar’s tiered architecture.

Let’s watch a Pulsar client application benchmark its own throughput and latency against a local Pulsar cluster.

# Start a local Pulsar cluster (e.g., using Docker Compose)
# Ensure ZooKeeper and BookKeeper are also running.

# Build the Pulsar perf tool (or download pre-built binaries)
git clone https://github.com/apache/pulsar.git
cd pulsar
mvn clean install -DskipTests

# Run the producer benchmark
bin/pulsar-perf produce my_topic --messages 1000000 --batching-max-publish-delay 10ms --message-size 1024 --producer-num 10 --acks 1

# Run the consumer benchmark
bin/pulsar-perf consume my_topic --messages 1000000 --consumer-num 5 --sub-name my_perf_sub

The pulsar-perf tool, bundled with Pulsar, is your primary instrument for this. It allows you to simulate both producers and consumers, generating realistic load against your topics. The key parameters you’ll observe and adjust are:

  • --messages: The total number of messages to send/receive.
  • --batching-max-publish-delay: For producers, this controls how long the client waits to batch messages before sending. A lower value means more frequent, smaller batches, potentially increasing overhead but reducing latency. A higher value allows larger batches, improving throughput but possibly increasing latency.
  • --message-size: The size of each individual message. Larger messages can improve throughput by amortizing overhead but increase latency.
  • --producer-num / --consumer-num: The number of concurrent producers or consumers. More clients mean more parallelism and potentially higher aggregate throughput, but also more contention on brokers and BookKeeper.
  • --acks: For producers, this dictates the acknowledgment strategy. acks=1 (default) means the producer waits for acknowledgment from the broker. acks=all waits for acknowledgment from the broker and all replicas in BookKeeper, providing stronger durability guarantees at the cost of latency.

The output of pulsar-perf will show you metrics like:

  • Throughput: Messages per second (msg/s) and bytes per second (B/s). This tells you how much data you can push through the system.
  • Latency: Typically reported as average, 50th percentile, 90th, 99th, and 99.9th percentile. This is crucial for understanding the responsiveness of your application.

Understanding the Pulsar architecture is key to interpreting these numbers. Producers send messages to brokers. Brokers, in turn, write these messages to BookKeeper (the durable storage layer). Consumers then read from brokers, which fetch data from BookKeeper. Each step introduces potential latency and throughput bottlenecks.

When tuning, you’re often balancing throughput and latency. For instance, increasing --batching-max-publish-delay can boost throughput by sending larger batches, but it might increase the perceived latency for the first message in a batch. Conversely, setting it very low (1ms) can reduce latency by sending messages immediately, but it might decrease overall throughput due to increased network and processing overhead.

The --acks setting is a direct trade-off between durability and performance. acks=1 is faster because it only requires confirmation from a single broker. acks=all is slower but guarantees that the message is replicated to a quorum of BookKeeper bookies before acknowledgment, making it much more resilient to failures.

A common misconception is that more producers/consumers always linearly increase throughput. While parallelism is essential, too many clients can saturate broker network interfaces, CPU, or BookKeeper’s disk I/O, leading to diminishing returns or even performance degradation due to increased contention and coordination overhead.

The performance characteristics of your underlying hardware (network bandwidth, CPU, disk IOPS, memory) will heavily influence achievable throughput and latency. BookKeeper’s performance, in particular, is often bound by disk I/O, especially for write-heavy workloads.

The next logical step after benchmarking throughput and latency is to analyze the behavior of individual components like brokers and bookies under load, often using metrics exposed by Pulsar itself.

Want structured learning?

Take the full Pulsar course →