ProxySQL doesn’t just blindly forward queries; it’s a smart traffic cop for your database, and rule-based sharding is how it decides which backend server gets which query.

Let’s see it in action. Imagine you have a users table, and you want to shard it based on user_id. ProxySQL can be configured to send all queries where user_id is between 1 and 1000 to mysql_server_1, and queries with user_id between 1001 and 2000 to mysql_server_2.

Here’s a snippet of what that looks like in ProxySQL’s configuration (via mysql client connected to ProxySQL’s admin interface):

INSERT INTO mysql_query_rules (rule_id, active, match_digest, destination_hostgroup, apply) VALUES
(1, 1, '^SELECT.*FROM users WHERE user_id BETWEEN 1 AND 1000', 1, 1),
(2, 1, '^SELECT.*FROM users WHERE user_id BETWEEN 1001 AND 2000', 2, 1),
(3, 1, '^INSERT.*INTO users.*VALUES.*\\(1,', 1, 1), -- Specific INSERT for user 1
(4, 1, '^INSERT.*INTO users.*VALUES.*\\(1001,', 2, 1), -- Specific INSERT for user 1001
(5, 1, '^UPDATE.*users SET.*WHERE user_id = 1', 1, 1), -- Specific UPDATE for user 1
(6, 1, '^UPDATE.*users SET.*WHERE user_id = 1001', 2, 1); -- Specific UPDATE for user 1001

And here’s how you’d define the hostgroups that these rules point to:

INSERT INTO mysql_hostgroups (hg_id, name, writer_hostgroup) VALUES
(1, 'users_shard_1', 1),
(2, 'users_shard_2', 2);

INSERT INTO mysql_servers (srv_id, hostgroup_id, hostname, port) VALUES
(1, 1, 'db-shard-1.example.com', 3306),
(2, 2, 'db-shard-2.example.com', 3306);

After loading these rules (LOAD MYSQL QUERY RULES TO RUNTIME; LOAD MYSQL HOSTGROUPS TO RUNTIME; LOAD MYSQL SERVERS TO RUNTIME;), ProxySQL intercepts incoming SQL statements. It then iterates through the mysql_query_rules table. For each rule, it checks if the match_digest (a regular expression) matches the incoming query. If there’s a match and the rule is active, ProxySQL routes the query to the destination_hostgroup specified in that rule. If the destination_hostgroup is a writer_hostgroup, the query goes to the primary for that shard.

The real power here is the flexibility. You can match on query structure (SELECT, INSERT, UPDATE, DELETE), table names, specific columns, or even values within the query. This allows for incredibly granular control over data distribution without modifying your application code. For example, you could route all queries for a specific tenant_id to a dedicated set of servers.

ProxySQL’s rule engine prioritizes rules by rule_id. Lower rule_id values are evaluated first. If a rule matches and its apply flag is set to 1, ProxySQL executes the action defined by the rule and stops processing further rules for that query. This means you can set up broad rules for general traffic and then more specific, higher-priority rules to handle exceptional cases or fine-grained routing.

The match_digest field uses PCRE (Perl Compatible Regular Expressions). This is where the magic happens. You can construct complex patterns to identify specific queries. For instance, to route all queries that don’t explicitly specify a user_id to a default group (perhaps for administrative tasks or queries that scan across shards), you might use a rule like this:

INSERT INTO mysql_query_rules (rule_id, active, match_digest, destination_hostgroup, apply) VALUES
(0, 1, '^SELECT.*FROM users WHERE .*', 3, 1); -- Default to hostgroup 3 if no specific user_id rule matched

This rule would have a low rule_id (like 0) to ensure it’s evaluated early, but it would only catch queries that don’t match the more specific user_id based rules (which would have higher rule_ids).

The apply column is crucial. If apply is 1, ProxySQL applies the rule and stops. If apply is 0, it means the rule is used for logging or other non-routing purposes, and ProxySQL continues to evaluate subsequent rules.

What most people don’t realize is that you can also use match_digest to block specific queries. By setting the destination_hostgroup to 0 (which is a special hostgroup representing no active backend servers), you can effectively null-route problematic or unauthorized queries.

The next step after mastering rule-based routing is understanding how to combine it with Query Cache to significantly reduce database load.

Want structured learning?

Take the full Sharding course →