Redpanda’s tiered storage lets you keep an infinite amount of data by offloading older segments to S3, but it’s not just a simple cache.
Let’s watch it in action. Imagine you have a Redpanda cluster with tiered storage enabled, pointing to an S3 bucket. You’re writing a topic, my_topic, and as segments roll over and get older than your retention policy, Redpanda starts moving them.
Here’s how you’d configure it in your redpanda.yaml:
# redpanda.yaml
...
tiered_storage:
enabled: true
s3:
bucket: "my-redpanda-tiered-storage-bucket"
region: "us-east-1"
endpoint_url: "https://s3.us-east-1.amazonaws.com" # Or your S3-compatible endpoint
access_key: "YOUR_ACCESS_KEY_ID"
secret_key: "YOUR_SECRET_ACCESS_KEY"
# Optional: If you need to specify a profile or role ARN
# profile: "redpanda-profile"
# role_arn: "arn:aws:iam::123456789012:role/RedpandaTieredStorageRole"
# Optional: Customize upload settings
upload_max_retries: 5
upload_timeout_ms: 30000
...
When a segment is no longer needed locally (based on its age and the configured retention), Redpanda marks it for upload. A background process then reads the segment’s data and uploads it as a single object to your S3 bucket. The object name in S3 will typically include the topic, partition, and the segment’s base offset, like my_topic/0/1234567890.log.
Once uploaded, Redpanda updates its internal metadata to point to the S3 object. When a consumer or another instance needs to read data from that segment, Redpanda will first check if it’s still present on local storage. If not, it will transparently fetch the segment from S3, cache it locally for a short period (if configured to do so), and then serve it. This process is invisible to clients.
The core problem tiered storage solves is the ever-increasing cost and operational complexity of storing massive amounts of historical data on local, high-performance storage. By moving older, less frequently accessed data to cheaper object storage, you reduce the footprint and I/O demands on your Redpanda nodes, allowing them to focus on serving live traffic.
Internally, Redpanda manages this by maintaining an index of all segments, both local and tiered. When a read request arrives for a specific offset range within a partition, Redpanda first consults this index. If the segment containing that offset is marked as tiered, Redpanda initiates a download from S3. The cache_ttl_ms configuration (under tiered_storage) controls how long downloaded segments are kept on local disk before being eligible for eviction, balancing read performance against local storage usage.
The exact levers you control are primarily within the tiered_storage and s3 sub-sections of your redpanda.yaml. You can enable/disable it, specify your S3 credentials and endpoint, and tune upload behavior. Beyond that, the segment_size configuration for your topics (defaulting to 1GB) directly impacts how frequently segments roll over and become candidates for tiering. Smaller segments mean more frequent rollovers and potentially more S3 operations.
What most people don’t realize is that tiered storage isn’t just about cost savings; it’s also a critical component for disaster recovery and multi-cluster setups. If a Redpanda node fails, its data can be reconstructed by fetching segments from S3. Furthermore, you can configure multiple Redpanda clusters to read from the same S3 bucket, enabling efficient data sharing or migration.
The next concept you’ll likely explore is how to monitor the health and performance of your tiered storage, particularly the upload and download latencies.