CockroachDB’s automatic range sharding is less about distributing data and more about distributing responsibility.
Imagine your database is a massive library. Without sharding, one librarian (a single node) would be responsible for shelving, retrieving, and managing every single book (data row). This quickly becomes impossible. Range sharding, in CockroachDB, is how the library system automatically breaks down the shelves into manageable sections and assigns different librarians (nodes) to oversee specific sections.
Here’s a simplified view of how it works in practice. Let’s say you have a table users with an integer primary key id.
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR,
email VARCHAR
);
When you insert your first few rows:
INSERT INTO users (id, name, email) VALUES
(1, 'Alice', 'alice@example.com'),
(1000, 'Bob', 'bob@example.com'),
(2000, 'Charlie', 'charlie@example.com'),
(3000, 'David', 'david@example.com');
CockroachDB doesn’t immediately split these into different locations. Initially, it might create a single "range" covering all possible integer IDs from negative infinity to positive infinity. This range is assigned to a leaseholder, which is a specific node in your cluster responsible for managing that range’s data and accepting writes.
Now, let’s add more data, and importantly, data that starts to spread out:
INSERT INTO users (id, name, email) VALUES
(50000, 'Eve', 'eve@example.com'),
(100000, 'Frank', 'frank@example.com'),
(150000, 'Grace', 'grace@example.com');
As the data grows and becomes more spread out, CockroachDB’s internal algorithms detect that the single, massive range is becoming too large or too hot (receiving too many requests). At this point, it initiates a "split."
A split isn’t a random event. CockroachDB looks for a natural dividing point within the key space of the range. For an integer primary key, this is often near the middle of the current data. Let’s say the system decides to split the range that contains IDs 1 through 150000. It might choose a split point around 75000.
The original range, let’s call it R1 (keys MIN to MAX), is now split into two new ranges:
R2: keysMINto75000R3: keys75000toMAX
CockroachDB then assigns new leaseholders to these ranges. One node might become the leaseholder for R2, and another node for R3. This distribution is what allows different nodes to handle different sets of keys concurrently. When a query comes in for id = 25000, the query router knows (or can quickly discover) that R2 is responsible for this key and directs the request to the node holding the lease for R2.
The magic is that this happens automatically. You don’t define shard boundaries. CockroachDB monitors range sizes (in terms of bytes and number of key-value pairs) and the load on range leaseholders. When a range exceeds configured thresholds (e.g., 512 MB or 100,000 key-value pairs), or if a leaseholder is overloaded, CockroachDB will trigger a split. It also performs merges if adjacent ranges become too small, consolidating them back under a single leaseholder to avoid excessive overhead.
The crucial levers you can pull are often indirect. The primary key choice is paramount. A monotonically increasing primary key (like a simple SERIAL or INT) can lead to hot spots on the last range, as all new writes go there. This is because new data is always appended, and the system will keep splitting the "end" of the key space.
Consider a range that spans 1 to 1000000. If your primary key is a UUID, the keys are inherently distributed across the entire possible UUID space, leading to more even distribution from the start. If it’s a sequential integer, all new inserts will fall into the highest-valued range, potentially causing that range’s leaseholder to become a bottleneck.
Here’s how you might inspect your ranges using the cockroach sql client:
SHOW RANGES FROM TABLE users;
This command will output information about each range, including its start key, end key, the ID of the node currently holding the lease for that range, and its approximate size.
range_id | start_key | end_key | lease_location | ...
-----------+-----------+---------+----------------+----
540982345 | (NULL) | 1 | {node 1,1} | ...
540982346 | 1 | 75000 | {node 2,2} | ...
540982347 | 75000 | 150000 | {node 3,3} | ...
540982348 | 150000 | (NULL) | {node 1,1} | ...
Notice how the end_key of one range becomes the start_key of the next. The (NULL) represents the absolute minimum and maximum possible values for the key type. The lease_location shows which node is currently the "leader" for that range, meaning it handles all reads and writes for that specific key span.
One aspect often overlooked is how CockroachDB handles the split transaction itself. When a range split is initiated, the leaseholder for the original range coordinates the split. It first creates the two new ranges, then it has to tell the rest of the cluster about these new ranges and update its own metadata. If the leaseholder crashes during this process, or if there’s a network partition, the cluster can end up with inconsistent range metadata. This is why you might see errors like "range does not exist" or "lease not found" even if the data technically still exists on some nodes. The cluster needs to re-reconcile its view of the ranges, which can involve electing new leaseholders and ensuring all nodes agree on the current range set.
The next challenge you’ll encounter is understanding how these ranges, and their leaseholders, move around the cluster during rebalancing or node failures.