Redpanda’s log segment retention is what keeps your disk from filling up, and it’s smarter than you think.
Let’s see it in action. Imagine you have a topic named my-topic with a single partition. You’re writing data to it, and Redpanda is creating log segments – files on disk that hold your topic’s data. By default, Redpanda keeps segments for 7 days or until they take up 1GB of space, whichever comes first.
# Check current retention settings for 'my-topic'
rpk topic topic-config my-topic --json
# Output might look like this:
# {
# "my-topic": {
# "cleanup.policy": "delete",
# "delete.retention.ms": 604800000, # 7 days in milliseconds
# "retention.bytes": 1073741824 # 1GB in bytes
# }
# }
If you write a lot of data very quickly, the retention.bytes limit will be hit first. Redpanda will start deleting the oldest segments to stay under 1GB. If you write data slowly, the delete.retention.ms will be hit first, and segments older than 7 days will be deleted.
Now, what if you want to change this? You can do it per topic. Let’s say you want to keep data for my-topic for 30 days, but only up to 10GB.
# Set retention to 30 days (2592000000 ms) and 10GB (10737418240 bytes)
rpk topic topic-config my-topic --set delete.retention.ms=2592000000 --set retention.bytes=10737418240
After applying this, rpk topic topic-config my-topic --json will reflect these new values. Redpanda will now monitor both conditions and delete segments once either the 30-day age limit or the 10GB size limit is exceeded for any given partition’s logs.
The cleanup.policy is crucial. If it’s set to compact, Redpanda doesn’t delete segments based on age or size directly. Instead, it keeps the latest version of each key. This is great for stateful data where you only care about the current value, but it can lead to disk usage growing indefinitely if you have many unique keys. For most streaming use cases, delete is the correct policy.
The default delete.retention.ms is 604800000 (7 days), and retention.bytes is 1073741824 (1 GiB). These are applied independently. If a segment is 8 days old and 1.1GiB, it will be deleted because it’s older than 7 days. If a segment is 6 days old but 1.1GiB, it will be deleted because it’s larger than 1GiB.
Redpanda doesn’t delete segments immediately when the threshold is crossed. It’s a background process. You might see temporary spikes in disk usage that then get cleaned up. The actual deletion happens when Redpanda’s segment management process checks the segments.
When you set retention.bytes, it’s a per-partition limit. If a topic has multiple partitions, each partition’s logs are managed independently against this limit. This is important for understanding total disk usage; if you have many partitions, the total disk usage can be number_of_partitions * retention.bytes.
Most people think of retention as just "how long do I keep data?" but the retention.bytes setting is a critical safeguard against runaway disk consumption, especially in high-throughput scenarios where data might age out before reaching the size limit. It’s a hard cap on how much disk space any single partition’s log files will occupy.
You can also set these at the cluster level, which then become the defaults for any new topics created.
# Set cluster-wide defaults (requires cluster admin privileges and might need a restart or specific API call depending on version)
# This is typically done in the redpanda.yaml configuration file:
#
# tune_kafka_broker:
# log_retention_time_ms: 1209600000 # 14 days
# log_retention_bytes: 5368709120 # 5 GiB
The next thing you’ll likely encounter is how to manage retention for specific topics when you have a mix of short-lived and long-lived data streams, and how to monitor disk usage effectively.