Redpanda’s snapshot and restore mechanism is surprisingly more about managing the state of your Kafka cluster than simply copying files.
Let’s see it in action. Imagine you have a topic my-topic with a few messages.
# Produce some messages
rpk topic produce my-topic --key-value "hello:world" --key-value "foo:bar"
# Check the topic
rpk topic consume my-topic --offset 0
# hello:world
# foo:bar
Now, let’s back it up. Redpanda stores its data in data/ by default, and snapshots are written to data/snapshots/.
# Create a snapshot (this will be written to data/snapshots/<topic_id>/<snapshot_id>)
# The snapshot ID is a timestamp in nanoseconds.
rpk snapshot create --topic my-topic
You’ll find a new directory in data/snapshots/ corresponding to my-topic’s internal ID. Inside, there’s a directory named with a nanosecond timestamp (e.g., 1678886400000000000). This directory contains files like 00000000000000000000.log (the segment file) and meta.json. The meta.json file is crucial; it records the offset and term of the snapshot.
// Example meta.json
{
"version": 1,
"last_offset": 1,
"last_term": 0,
"ancestor_snapshot_id": "000000000000000000000000000000000000",
"compression": "none",
"size": 1234
}
This last_offset is the key. When you restore, Redpanda doesn’t just overwrite existing data. It uses this last_offset to determine where to resume replication from.
To restore, you first need to stop Redpanda. Then, you can copy your snapshot files into the data/snapshots/ directory of the target Redpanda instance. Let’s say you want to restore to a clean data directory.
- Stop Redpanda.
- Locate your snapshot directory. For our example, assume it’s
/path/to/your/snapshot/data/snapshots/bafybeig.../1678886400000000000. - Copy the snapshot files into the new Redpanda’s
data/snapshots/directory. For a clean restore, you might create a new topic directory if one doesn’t exist:# On the target Redpanda instance mkdir -p /new/redpanda/data/snapshots/bafybeig... cp /path/to/your/snapshot/data/snapshots/bafybeig.../1678886400000000000/* /new/redpanda/data/snapshots/bafybeig.../ - Start Redpanda.
When Redpanda starts, it scans data/snapshots/. It sees the snapshot for bafybeig... and notices its last_offset is 1. If the topic my-topic already exists and has data up to offset 0, Redpanda will effectively "catch up" by replaying the snapshot’s data, bringing the topic’s offset to 1. If the topic doesn’t exist, Redpanda will create it and populate it with the snapshot data.
The real power here is in incremental backups and distributed restores. If you have multiple partitions for a topic, Redpanda snapshots each partition independently. When you restore, you can restore individual partitions. Redpanda uses the meta.json to ensure that the restored partition’s offset is correctly set, allowing it to seamlessly rejoin the replication log. You can even restore to a later point in time if you have multiple snapshots, by selecting the snapshot with the highest last_offset for that partition.
Crucially, Redpanda doesn’t require a full cluster restart for snapshot restores to data directories. If you’re just adding a snapshot to an existing data/snapshots directory on a running cluster, Redpanda will pick it up on its own. The full stop/start is primarily for initial data directory setup or when the data directory itself is being replaced.
The next step after mastering snapshots is understanding how to integrate this into a disaster recovery plan, which involves managing snapshot retention and potentially offsite storage.