Moving data between shards without taking your application offline feels like magic, but it’s a core capability in many distributed databases. The real trick isn’t just moving bytes; it’s ensuring that at the exact moment the data switches from being served by the old shard to the new one, your application sees a consistent, up-to-date view, and that no writes get lost or duplicated.

Let’s see this in action with a simplified example. Imagine a sharded key-value store where keys are hashed to determine their shard. We want to move a range of keys from shard-1 to shard-2.

First, we’d provision shard-2 and ensure it’s ready to receive data.

# Create the new shard and configure it to accept writes
kubectl create -f new-shard-config.yaml
# ... wait for shard-2 to be healthy and registered ...

Then, we initiate the migration process. The database internally starts a background process that reads data from shard-1 and writes it to shard-2. Crucially, it also sets up a replication stream or logical replication from shard-1 to shard-2 to capture any changes that happen during the migration.

# Initiate the data copy and replication
db-cli --migrate shard-1 --to shard-2 --keyspace 'my_app_keyspace' --range '0x1000-0x2000'

While this is happening, shard-1 continues serving reads and writes for the data being migrated. The new writes are mirrored to shard-2 via the replication stream.

Once the initial data copy is complete, the system enters a "dual-write" or "synchronization" phase. All writes to the affected key range are now sent to both shard-1 and shard-2. Reads might still be directed to shard-1, but the system is verifying that shard-2 is catching up.

# Application writes 'key-abc' = 'value-1'
# DB:
#   - Writes 'key-abc' = 'value-1' to shard-1
#   - Writes 'key-abc' = 'value-1' to shard-2 (via replication/dual-write)
#   - Verifies checksums/hashes match between shards for this key range

The core problem this solves is the "cutover." How do you switch the application’s pointer from shard-1 to shard-2 without dropping transactions? The system achieves this by managing a global transaction log or a distributed consensus mechanism. When the system determines that shard-2 has fully caught up and is consistent with shard-1 for the entire migration range, it atomically updates the routing information.

# DB:
#   - Confirms shard-2 is fully synchronized for keys 0x1000-0x2000
#   - Atomically updates routing table: Keys 0x1000-0x2000 now map to shard-2
#   - Stops dual-writes to shard-1 for this range

After the atomic switch, shard-1 no longer serves requests for that key range, and shard-2 becomes the sole source of truth. The application, unaware of the internal switch, continues to make requests, which are now transparently routed to the new shard. The old shard then starts a cleanup process, removing the migrated data.

The mental model here is one of layered consistency. First, you copy the bulk data. Then, you capture the delta. While the delta is being applied, you might enter a dual-write phase to ensure no writes are missed. Finally, a carefully orchestrated, atomic switch of the routing layer ensures that the application sees a seamless transition, with no lost data and minimal (ideally zero) downtime. The "magic" is in the database’s internal coordination to ensure that at the point of cutover, both the data and the application’s view of it are consistent.

What most people don’t realize is that the "atomic switch" isn’t just a DNS change or a configuration update. It often involves the database’s internal metadata, which is itself managed by a consensus protocol (like Raft or Paxos). This ensures that the routing table update is propagated consistently across all nodes in the cluster, preventing split-brain scenarios where different nodes might think the data lives on different shards.

The next challenge is handling hot spots that emerge after a migration, where a single new shard becomes overloaded.

Want structured learning?

Take the full Sharding course →