Sharding your database by customer is the most direct way to achieve true tenant isolation, but it’s also the most complex.

Let’s see it in action. Imagine a system that needs to store data for multiple clients, each with their own set of users, orders, and products. Without sharding, all this data lives in the same database tables.

-- Table: users
CREATE TABLE users (
    user_id BIGSERIAL PRIMARY KEY,
    tenant_id INT, -- This is what we'll shard on
    username VARCHAR(50),
    email VARCHAR(100)
);

-- Table: orders
CREATE TABLE orders (
    order_id BIGSERIAL PRIMARY KEY,
    tenant_id INT, -- This is what we'll shard on
    user_id BIGINT,
    order_date TIMESTAMP,
    total_amount DECIMAL(10, 2)
);

When a request comes in for tenant_id = 123, the application needs to know which database shard holds that tenant’s data. This is where the sharding logic comes in. Typically, you’ll have a routing layer or a lookup service that maps tenant_id to a specific shard.

Here’s a simplified view of how the application might decide where to send a query:

# In your application code
def get_shard_connection(tenant_id):
    # This logic would be more sophisticated in production,
    # likely involving a configuration service or a database lookup.
    if tenant_id % 4 == 0:
        return connect_to_shard("shard_0")
    elif tenant_id % 4 == 1:
        return connect_to_shard("shard_1")
    elif tenant_id % 4 == 2:
        return connect_to_shard("shard_2")
    else:
        return connect_to_shard("shard_3")

def get_user_data(tenant_id, user_id):
    conn = get_shard_connection(tenant_id)
    cursor = conn.cursor()
    cursor.execute(f"SELECT * FROM users WHERE user_id = {user_id} AND tenant_id = {tenant_id}")
    return cursor.fetchone()

In this setup, tenant_id is the sharding key. All data belonging to a specific tenant_id (or a range of tenant_ids) is stored on a single shard. This means tenant_id = 123 might live entirely on shard_2, while tenant_id = 456 lives on shard_0.

The primary problem this solves is noisy neighbor issues. If one tenant experiences a massive surge in traffic or data volume, it only impacts the resources allocated to their specific shard. Other tenants remain unaffected. It also simplifies security and compliance; you can isolate sensitive data to specific shards, potentially even geographically.

Internally, the sharding mechanism involves several components:

  1. Sharding Key: The attribute used to distribute data (in our example, tenant_id).
  2. Sharding Strategy: How the key is mapped to shards (e.g., range-based, hash-based, directory-based). We used a simple modulo (% 4) which is a form of hash-based sharding.
  3. Routing Layer: The intelligence that directs queries to the correct shard. This can be in the application, a dedicated proxy (like Citus, ProxySQL), or built into the database itself.
  4. Shard Management: The operational aspect of provisioning, monitoring, and balancing shards.

When you sharding by tenant_id, you’re essentially creating independent databases for logical groups of customers. This means queries that involve joins across tables for a single tenant are highly efficient because all that data resides on the same physical shard. Cross-shard operations, however, become significantly more complex and expensive, often requiring application-level aggregation or specialized tools.

The most surprising aspect of multi-tenant sharding by customer is how it completely flips the problem of data distribution. Instead of distributing data across tables in a single database, you’re distributing tables (and all their data) across multiple independent databases, each dedicated to a subset of your tenants. This makes the concept of a "database" itself fluid; your entire database cluster becomes a collection of tenant-specific data stores.

Moving a tenant to a new shard, whether for rebalancing or scaling, is a non-trivial operation. It typically involves:

  1. Provisioning a new, empty shard.
  2. Copying all the data for the target tenant from their current shard to the new shard.
  3. Updating the routing layer to point the tenant’s tenant_id to the new shard.
  4. Performing a final sync of any writes that occurred during the copy.
  5. Decommissioning the old shard (or at least stopping writes to it).

This process is often managed by specialized tooling or requires careful orchestration to minimize downtime and data inconsistency.

The next hurdle you’ll face is managing cross-tenant analytics queries.

Want structured learning?

Take the full Sharding course →