Sharding an index strategy can feel like you’re building a distributed database where every shard is its own world, but they all need to agree on how to find things.
Let’s see this in action. Imagine you have a users collection sharded by user_id. Each shard has its own users collection.
// On shard 0 (e.g., server A)
db.users.insertOne({ _id: 1001, name: "Alice", email: "alice@example.com" })
db.users.insertOne({ _id: 1002, name: "Bob", email: "bob@example.com" })
// On shard 1 (e.g., server B)
db.users.insertOne({ _id: 2001, name: "Charlie", email: "charlie@example.com" })
db.users.insertOne({ _id: 2002, name: "David", email: "david@example.com" })
Now, you want to find a user by their email. If you just create an index on email on each shard independently, you’ve created local indexes.
// On shard 0
db.users.createIndex({ email: 1 })
// On shard 1
db.users.createIndex({ email: 1 })
When you query db.users.find({ email: "alice@example.com" }), the query router (mongos) knows user_id is the shard key. It can’t route this query to a specific shard because email isn’t the shard key. So, it has to broadcast the query to all shards. Each shard then uses its local index to find "alice@example.com". This is inefficient because every shard does work it doesn’t need to.
This is where global indexes come in. A global index is an index on a field that is not the shard key, but you want to query it efficiently across all shards. To make an index global, you create it on the mongos process, not directly on a shard.
// On mongos
db.users.createIndex({ email: 1 })
When you create an index on mongos, it tells the sharding system to build that index on all shards. The sharding system coordinates this, ensuring the index is created consistently everywhere. Now, when you query db.users.find({ email: "alice@example.com" }), mongos can use the global index metadata to determine which shard(s) might contain "alice@example.com". In this case, since there’s no compound index involving user_id and email, it still might have to query multiple shards, but the index lookup itself is now more intelligent.
The real power of global indexes shines when you have a compound index that includes the shard key. If you have db.users.createIndex({ user_id: 1, email: 1 }) on mongos, this index is built on all shards as { user_id: 1, email: 1 }. Now, if you query db.users.find({ user_id: 1001, email: "alice@example.com" }), mongos can use the user_id part of the index to route the query directly to the shard containing user_id: 1001. The email part of the index is then used on that single shard for a super-fast lookup. This is the ideal scenario.
The problem this solves is efficient querying on non-shard key fields in a sharded environment. Without global indexes, queries on non-shard key fields require scatter-gather operations, where the query is sent to every shard, and each shard scans its local data or local index. This is slow and resource-intensive. Global indexes allow the query router to target specific shards or at least leverage index information across shards more effectively.
Internally, when you create a global index via mongos, mongos communicates with the config servers. The config servers then instruct each shard server to create the specified index on its local copy of the collection. This process is managed by the sh.status() command, which shows the shard key and any indexes that are present on all shards. If an index is missing on any shard, sh.status() will indicate this, and you’ll see it listed as a "multi-key index" or a "local index" if it’s not present on all shards.
The key difference is where the index is created and how the query planner uses it. Local indexes are created directly on a shard and are only known to that shard. Global indexes are created via mongos and are known to the sharding system, allowing for more intelligent query routing or at least more informed scatter-gather operations.
A common pitfall is creating an index on a non-shard key field directly on a shard. This results in a local index. For example, if you connect directly to shard A and run db.users.createIndex({ email: 1 }), this index will only exist on shard A. Queries for email will still be broadcast to all shards, and only shard A will efficiently use its local index, while others will perform a collection scan. You must create global indexes via mongos to ensure they are distributed and recognized by the query router.
The next thing you’ll run into is understanding how to optimize queries when you have multiple global indexes or when your query involves fields that aren’t indexed at all.