Vitess isn’t just a sharding layer for MySQL; it’s a distributed SQL database system that uses MySQL as its storage engine, fundamentally changing how you think about database availability and scalability.

Let’s see Vitess in action. Imagine we have a simple users table and we want to scale it horizontally.

CREATE TABLE users (
    user_id BIGINT AUTO_INCREMENT,
    username VARCHAR(255) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    PRIMARY KEY (user_id)
);

In a traditional MySQL setup, all writes for this table go to a single server. With Vitess, we can shard this table. Let’s say we shard it by user_id. Vitess will then distribute these user_id ranges across multiple MySQL instances, called vSilos.

Here’s a simplified look at a Vitess topology:

+-------------------+       +-------------------+       +-------------------+
|      vtgate       | ----> |      vtgate       | ----> |      vtgate       |
+-------------------+       +-------------------+       +-------------------+
        |                           |                           |
        | (Query Routing)           | (Query Routing)           | (Query Routing)
        v                           v                           v
+-------------------+       +-------------------+       +-------------------+
|     vtctld        |       |     vtctld        |       |     vtctld        |
+-------------------+       +-------------------+       +-------------------+
        | (Control Plane)
        v
+-------------------+
|   Topology Server | (e.g., etcd, ZooKeeper)
+-------------------+
        |
        v
+-------------------+       +-------------------+       +-------------------+
|      vtworker     |       |      vtworker     |       |      vtworker     |
+-------------------+       +-------------------+       +-------------------+
        | (Resharding, Schema Changes)
        v
+-------------------+       +-------------------+       +-------------------+
|      vttablet     |       |      vttablet     |       |      vttablet     |
+-------------------+       +-------------------+       +-------------------+
        | (MySQL Instances)
        v
+-------------------+       +-------------------+       +-------------------+
|    MySQL (Shard 0)|       |    MySQL (Shard 1)|       |    MySQL (Shard N)|
+-------------------+       +-------------------+       +-------------------+

vtgate: This is the entry point for your application. It receives SQL queries, figures out which shards the data resides on, and routes the query to the appropriate vttablet instances. It also aggregates results from multiple shards if needed.

vttablet: This is the stateful component that manages a shard. It acts as a proxy to a set of underlying MySQL instances (which can be master/replica pairs for high availability). It handles connections, query execution, and transaction management for its assigned shard(s).

vtctld: This is the control plane. It manages the overall Vitess cluster, orchestrates actions like schema changes, resharding, and backups. It communicates with vttablets and the Topology Server.

Topology Server: This is a distributed coordination service (like etcd or ZooKeeper) that stores the Vitess cluster’s metadata. It holds information about shards, vttablets, their addresses, and the overall cluster configuration.

vtworker: This component is used for background tasks like schema changes, resharding, and data integrity checks.

The problem Vitess solves is the inherent limitations of a single relational database instance:

  1. Scalability: A single MySQL instance has finite CPU, memory, and I/O capacity.
  2. Availability: A single instance is a single point of failure.

Vitess addresses these by:

  • Sharding: Horizontally partitioning your data across multiple independent database instances. This distributes the read and write load.
  • Replication: Each shard can have its own master-replica setup, providing high availability within that shard.
  • Global Transactions: Vitess supports distributed transactions across shards using a two-phase commit protocol, ensuring data consistency.
  • Online Schema Changes: It allows you to make schema changes without taking your database offline, a critical feature for large-scale applications.

When an application sends a query like SELECT * FROM users WHERE user_id = 12345;, vtgate consults the Topology Server to determine which shard user_id = 12345 belongs to. It then forwards the query to the specific vttablet responsible for that shard. If the query involves multiple shards (e.g., SELECT COUNT(*) FROM users;), vtgate fans out the query to all relevant vttablets and aggregates the results.

The most surprising thing is how Vitess manages connections. Instead of your application directly connecting to MySQL, it connects to vtgate. vtgate then maintains a pool of connections to the relevant vttablets, and vttablets maintain pools to the underlying MySQL instances. This connection pooling and multiplexing is crucial for performance and managing resources efficiently in a distributed environment. It means your application doesn’t need to know about the underlying complexity of shards or vttablets; it just talks to vtgate as if it were a single, massive database.

The next step in understanding Vitess is exploring how it handles resharding and the implications of different sharding strategies.

Want structured learning?

Take the full Planetscale course →