Redpanda P99 Latency: Optimize Produce and Consume (2026)

Redpanda’s P99 latency for produce and consume operations is a surprisingly good indicator of overall cluster health and network saturation, not just individual broker performance.

Let’s dive into how to keep those tail latencies snappy.

Imagine a client sending a message (produce) or fetching messages (consume). Redpanda, being Kafka-compatible, orchestrates this using a distributed log. When you send data, it’s written to a local partition leader, replicated to followers, and then acknowledged back to the client. For consumption, the client fetches from the leader. P99 latency means we’re looking at the 99% of operations that were not the fastest. High P99s mean that a small but significant fraction of your requests are experiencing unusually long delays, often due to bottlenecks elsewhere in the system.

Here’s Redpanda in action, showing produce latency metrics. This is a simplified rpk command. In a real scenario, you’d be monitoring this continuously.

rpk topic produce my-topic --value "test message" --print-latency

The output might look something like this (values are illustrative):

sent message to partition 0
produced in 3.2ms (0.0032s)

Now, let’s break down what influences those milliseconds and how to optimize.

1. Network Bandwidth and Congestion

This is the most common culprit. Redpanda is a network-intensive application. Replication, client communication, and inter-broker heartbeats all consume bandwidth.

Diagnosis: Use iperf3 between Redpanda nodes and between clients and brokers. Monitor network interface statistics on your nodes (e.g., sar -n DEV 1 or cloud provider network monitoring). Look for high utilization or packet drops.
Fix: Increase network bandwidth (e.g., upgrade NICs, cloud instance types, or network links). If bandwidth is fixed, reduce the number of topics, partitions, or the rate of data ingress/egress. For instance, if a topic is producing 100MB/s and replication is set to all (or min.insync.replicas=3), you need at least 300MB/s for replication alone, plus client traffic.
Why it works: More bandwidth means data can move faster, reducing queueing and delays. Reducing load on fixed bandwidth prevents saturation.

2. Disk I/O Performance

Redpanda writes data to disk for persistence. Slow disks can create a backlog, making producers wait and consumers fetch data slowly.

Diagnosis: Use iostat -xz 1 on Redpanda nodes. Look for high %util, high await (average wait time for I/O completion), and high svctm (average service time). Redpanda’s documentation often recommends specific IOPS and throughput for different workloads.
Fix: Use faster storage (SSDs, NVMe). Tune Redpanda’s storage_max_throughput_mb_per_sec configuration if it’s artificially limiting writes. Ensure your filesystem is optimized (e.g., XFS with noatime mount option).
Why it works: Faster disks can service read/write requests more quickly, reducing the time data spends waiting to be persisted or read.

3. CPU Saturation

While Redpanda is efficient, high request rates, complex configurations, or other processes on the same node can saturate CPUs, leading to delays in request processing and network I/O.

Diagnosis: Use top or htop to check CPU usage. Look for processes consuming high CPU, especially redpanda or kernel threads related to networking. Monitor iowait percentages in top – high iowait points to disk or network bottlenecks that the CPU is waiting on.
Fix: Scale up CPU cores on your Redpanda nodes. Offload other applications from broker nodes. Ensure Redpanda is configured to use available cores effectively (default is usually good).
Why it works: More CPU cycles mean the Redpanda process and its underlying kernel threads can handle incoming requests and outgoing data more rapidly.

4. min.insync.replicas and Producer acks

These settings directly control how many replicas must acknowledge a write before it’s considered successful.

Diagnosis: Check your topic configuration for min.insync.replicas and your producer client configuration for acks. If min.insync.replicas is set to all or a high number (e.g., 3) and acks=all, every write must be confirmed by all replicas.
Fix: If latency is critical and data durability can tolerate slightly less stringent guarantees, consider setting acks=1 on the producer or lowering min.insync.replicas for less critical topics. For example, for a topic with 3 replicas, setting min.insync.replicas=2 means a write is confirmed once the leader and one follower acknowledge it.
Why it works: Reducing the number of acknowledgments required from other brokers shortens the critical path for a successful produce request.

5. Topic and Partition Count

A large number of partitions, especially if many are underutilized or spread across a slow network, can increase overhead.

Diagnosis: Use rpk topic list-partitions to see partition counts. Monitor per-partition metrics in Redpanda’s metrics endpoint (e.g., using Prometheus/Grafana) for throughput and latency.
Fix: Consolidate topics or repartition if necessary. Ensure partitions are evenly distributed across brokers. Avoid excessive partitioning for low-throughput topics.
Why it works: Fewer active endpoints to manage reduces internal overhead, and better distribution ensures load is spread evenly, preventing hot spots.

6. fetch.max.bytes and max.poll.records (Consumer Side)

For consumers, fetching too little data per request or too many records per poll can lead to higher P99s.

Diagnosis: Monitor consumer lag and per-fetch request latency. Check client configuration for fetch.max.bytes and max.poll.records.
Fix: Increase fetch.max.bytes to a larger value (e.g., 1MB or more, depending on your network and data size) to reduce the number of network round trips. Increase max.poll.records to process more data in a single poll() call, amortizing the cost of the poll.
Why it works: Larger fetches reduce the overhead of frequent network requests. Processing more records per poll reduces the frequency of the polling loop and its associated state management.

The next error you’ll likely encounter after optimizing P99 latency is an increase in average latency due to the trade-offs made, or perhaps consumer rebalances if you’ve tuned max.poll.interval.ms too aggressively.