The most surprising thing about re-sharding is that it’s not about moving data, but about moving the keys.
Let’s look at a simple Redis cluster. We have three master nodes, each responsible for a range of hash slots.
127.0.0.1:7000 master
127.0.0.1:7001 master
127.0.0.1:7002 master
Let’s say node 7000 owns slots 0-5460, 7001 owns 5461-10922, and 7002 owns 10923-16383. When you SET mykey value, Redis calculates CRC16("mykey") % 16384. If that number falls into 7000’s range, the key goes there. Simple.
Now, imagine you need to add a new node, 7003, to handle more load. You bring it up, and it’s empty. Redis knows about it, but it doesn’t own any slots yet.
127.0.0.1:7000 master (0-5460)
127.0.0.1:7001 master (5461-10922)
127.0.0.1:7002 master (10923-16383)
127.0.0.1:7003 master (no slots)
To rebalance, we don’t just copy data. We tell Redis to transfer ownership of slots. We might tell 7000 to give up slots 0-2000 to 7003.
This is where the magic happens. Redis doesn’t immediately move data. Instead, it marks those slots as "migrating" from 7000 to 7003.
On 7000:
- If a client tries to
GETa key in a migrating slot,7000intercepts it. It forwards the request to7003. - If
7003has the key, it sends the value back to the client. - Crucially, it also sends a special
MIGRATINGreply to7000’s client.7000then writes this key to7003and updates its local slot map.
On 7003:
- When
7003receives a forwarded request for a migrating slot, it checks if it has the key. - If it does, it replies directly to the client.
- If it doesn’t have the key yet (because the original client wrote it to
7000after the migration started, but before7000forwarded it), it replies with aBUSYKEYerror. The client then retries, and this time7000will forward the request again, and7003will likely have it.
Simultaneously, 7003 starts receiving REPLICATING commands from 7000 for keys in the slots it’s taking over. This is an asynchronous replication process. 7003 becomes a replica of those specific slots on 7000.
Once 7003 has a full copy of the data for those slots (indicated by SYNC completion), the slots are officially transferred. 7000 stops serving them, and 7003 starts serving them directly. The client requests now go straight to 7003 without forwarding.
This "migrating" and "replicating" dance allows clients to continue reading and writing data without interruption. The actual data movement happens in the background, managed by Redis itself, while the cluster remains available. The key is that Redis manages the client redirection and background replication based on slot state changes.
The real power comes from using the CLUSTER command for this. You’d use commands like CLUSTER ADDSLOTS and CLUSTER DELSLOTS on the receiving node, and CLUSTER SETSLOT <slot> IMPORTING <source_node_id> and CLUSTER SETSLOT <slot> MIGRATING <target_node_id> on the source node. Redis handles the forwarding and replication automatically.
The crucial part is that a client library must be aware of the CLUSTER command and handle MOVED and ASK redirections. If your client doesn’t support cluster mode, re-sharding will break your application.
The next thing you’ll likely wrestle with is how to manage slot distribution for optimal performance across nodes, not just availability.