Redpanda’s internal data structures are designed to be row-oriented, which means it might surprise you that optimizing for disk I/O often involves tuning settings that look like they belong in a column-oriented database.
Let’s see Redpanda in action. Imagine a simple topic my-topic with 3 partitions, and we’re sending data to it.
# Start Redpanda (example with basic config)
rpk container start -c /path/to/redpanda.yaml
# Create a topic
rpk topic create my-topic --partitions 3 --replicas 3
# Produce some data
echo "hello world" | rpk topic produce my-topic --key hello
echo "another message" | rpk topic produce my-topic --key another
# Consume the data
rpk topic consume my-topic
This basic setup, while functional, leaves a lot of performance on the table. Redpanda, like Kafka, is fundamentally a distributed commit log. Its performance is heavily dictated by how efficiently it can write data to disk and serve reads from it, while also managing replication and internal state. The key to tuning lies in understanding the trade-offs between latency, throughput, and resource utilization, particularly CPU and disk I/O.
The core of Redpanda’s performance tuning revolves around its redpanda.yaml configuration file. Many of the critical settings directly impact how data is batched, buffered, and flushed to disk.
Batching and Flushing
redpanda.log.flush_interval_ms and redpanda.log.max_batch_size are your primary levers for controlling disk I/O. flush_interval_ms dictates how often Redpanda attempts to flush buffered records to disk. A lower value means more frequent flushes, potentially increasing I/O operations but reducing the amount of data lost in a crash. max_batch_size controls the maximum number of messages that can be buffered before a flush is triggered, regardless of the interval.
Diagnosis: Monitor disk I/O using iostat -xz 1 on your Redpanda nodes. Look for consistently high %util or await times. Also, observe your message production rate and latency in Redpanda’s metrics.
Fix:
- Increase
redpanda.log.flush_interval_ms: If your latency is acceptable and you see frequent small flushes, try increasing this. For example, from100msto500ms. This allows more records to accumulate into larger, more efficient disk writes. - Increase
redpanda.log.max_batch_size: If you have high throughput and small batches are causing frequent flushes, increase this. For example, from1048576(1MB) to4194304(4MB). This ensures that flushes are triggered by larger amounts of data, leading to fewer, bigger writes.
Why it works: Larger, sequential disk writes are significantly more efficient than many small, random writes. By allowing more data to buffer, you create these larger write operations, reducing the overhead of disk seeks and write acknowledgments.
Memory Buffering and Caching
redpanda.memory.max_cache_memory_mb governs the total memory available for caching data segments and other internal structures. A larger cache can reduce disk reads for frequently accessed data. However, it also competes with other memory consumers.
Diagnosis: Monitor RAM usage on your Redpanda nodes. If free memory is consistently low and the system is swapping, your cache might be too large, or you have insufficient RAM. Conversely, if disk reads are high and free memory is abundant, the cache might be too small.
Fix:
- Increase
redpanda.memory.max_cache_memory_mb: If you have spare RAM and high disk read latency, consider increasing this. For example, from2048to4096. This allows Redpanda to keep more of its working set in memory, reducing the need to hit the disk. - Decrease
redpanda.memory.max_cache_memory_mb: If your system is swapping, reduce this to free up RAM for the OS and other processes.
Why it works: Caching frequently accessed data in RAM dramatically speeds up read operations. By tuning the cache size, you balance the benefits of in-memory access against the risk of memory pressure and swapping.
Network and I/O Threads
Redpanda uses a thread-per-core model for network and I/O operations. The number of threads is generally determined by the number of CPU cores available. However, certain settings can influence how these threads operate.
redpanda.io.max_write_batch_bytes controls the maximum size of a batch of records that can be sent over the network to followers during replication.
Diagnosis: Monitor network throughput and latency. If you’re seeing network saturation or high replication lag, this setting might be a factor.
Fix:
- Increase
redpanda.io.max_write_batch_bytes: If you have high network bandwidth and are experiencing replication lag, increasing this can allow for larger, more efficient network transfers. For example, from1048576(1MB) to4194304(4MB).
Why it works: Similar to disk writes, larger network batches can improve throughput by reducing the overhead per byte transferred.
Topic and Partition Settings
While not strictly "cluster config tuning," the number of partitions and replication factor for your topics have a significant impact. More partitions allow for higher parallelism for producers and consumers, but also increase the overhead of metadata management and inter-broker communication.
Diagnosis: Monitor consumer lag using rpk topic consume --offsets my-topic. If lag is consistently high and you have CPU/disk headroom, more partitions might help. Conversely, if you have many topics with few partitions, you might be underutilizing your cluster.
Fix:
- Increase partitions: If a single consumer group is bottlenecked by a single partition’s throughput, and the broker serving that partition has headroom, consider increasing the number of partitions for that topic. This distributes the load across more brokers and consumers.
- Decrease partitions: If you have a very large number of partitions and are seeing high CPU usage on brokers simply managing metadata or network connections, consider consolidating topics or reducing partitions if throughput allows.
Why it works: Distributing data and load across more partitions allows for greater horizontal scaling of producer and consumer throughput.
The Often-Overlooked io_uring
Redpanda leverages io_uring for asynchronous I/O on Linux, which is generally much more efficient than traditional epoll or select. However, its effectiveness can be influenced by how it’s configured, particularly the number of submission and completion queues. While not directly exposed as a simple redpanda.yaml knob for queue depth, the underlying kernel configuration and Redpanda’s internal usage of io_uring are critical. If you’re seeing high disk latency and have verified other settings, ensure your kernel is well-tuned for io_uring (e.g., io_uring_max_submit_buffers and io_uring_max_complete_buffers sysctls, though Redpanda manages its usage internally). The key takeaway is that Redpanda wants to use io_uring efficiently. If it’s not, it’s often due to external factors like very slow disks, network saturation, or insufficient CPU to process the io_uring completions quickly.
The next error you’ll likely encounter after optimizing for throughput is related to controller.state_update_interval_ms and controller.id_update_interval_ms, where aggressively fast updates can lead to controller churn if not carefully balanced.