S3 Performance: Prefix Sharding for High Request Rates (2026)

Prefix sharding is the most effective way to scale S3 performance beyond the limits of a single prefix, but it’s not about spreading data evenly; it’s about spreading request rate evenly.

Let’s see it in action. Imagine you have an application that needs to read 10,000 objects per second from S3. If all these objects live under a single prefix, say s3://my-bucket/my-data/, you’re going to hit S3’s request rate limits for that prefix. S3 can handle a lot, but there’s a practical ceiling.

Here’s what a high-throughput read operation might look like in Python:

import boto3
import threading
from queue import Queue

s3 = boto3.client('s3')
bucket_name = 'my-bucket'
base_key_prefix = 'my-data/'
num_threads = 10
objects_per_thread = 1000 # Each thread will try to read 1000 objects

def read_object(obj_key):
    try:
        response = s3.get_object(Bucket=bucket_name, Key=obj_key)
        # Process the object content if needed
        # print(f"Successfully read {obj_key}")
    except Exception as e:
        print(f"Error reading {obj_key}: {e}")

def worker(thread_id):
    for i in range(objects_per_thread):
        # This is where sharding comes in
        # If we just did f'{base_key_prefix}{i}', we'd hit limits.
        # Instead, we shard by prefix.
        shard_id = i % num_threads # Distribute across shards
        obj_key = f'{base_key_prefix}shard_{shard_id}/{i}.dat'
        read_object(obj_key)

threads = []
for i in range(num_threads):
    thread = threading.Thread(target=worker, args=(i,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print("All threads finished.")

In this example, shard_id = i % num_threads is the core of prefix sharding. Instead of all requests going to s3://my-bucket/my-data/, they are distributed across s3://my-bucket/my-data/shard_0/, s3://my-bucket/my-data/shard_1/, and so on. Each of these sub-prefixes can then scale independently.

The problem this solves is the implicit scaling limit S3 imposes on individual prefixes. While S3 itself is massively scalable, the performance per prefix has practical limits. When your application’s request rate for objects within a specific logical group (like a single prefix) exceeds these limits, you start seeing throttling errors, increased latency, and unpredictable performance. Prefix sharding effectively creates multiple independent scaling points within your bucket, allowing you to aggregate higher throughput.

Internally, S3 partitions its storage and request handling capacity based on prefixes. When you use a single, long, and consistently structured prefix, you are essentially concentrating all your requests onto a single partition. By introducing a sharding component into your prefix (like shard_X in the example), you instruct S3 to distribute those requests across different internal partitions, each capable of handling its own request rate. The key is that the sharded part of the prefix should be the most significant differentiator for the requests you’re making. For high-volume reads or writes, this usually means putting it early in the key path.

The exact levers you control are the structure of your S3 object keys. You don’t configure anything in S3 to enable this; it’s purely a client-side strategy applied to how you name your objects. The most common and effective sharding patterns involve adding a sequential or pseudo-random component near the beginning of the key. Examples include:

Sequential Sharding: data/0000/file.txt, data/0001/file.txt
Timestamp Sharding: logs/2023/10/27/14/log.txt (if you have high write rates within a specific minute)
Hashed Sharding: users/a3/user_id.json, users/f8/another_user_id.json (where a3 and f8 are derived from a hash of user_id)

The goal is to ensure that the traffic pattern of your application maps to these different sharded prefixes, thereby distributing the load. If you have 10,000 requests per second and want to scale, you might aim to put 1,000 requests per second onto each of 10 different sharded prefixes.

A common pitfall is sharding based on something that doesn’t distribute requests evenly. For instance, if you shard by the last part of a key, and most of your requests target a small subset of those last parts, you haven’t effectively distributed the load. The sharding component needs to be the primary determinant of which internal S3 partition the request hits, and your application’s access patterns must spread across these shards.

The most surprising thing about prefix sharding is that S3’s "request rate" limits are not a single global number for a bucket, but rather a per-prefix limit that S3 dynamically manages internally. You can’t see these limits, but you can infer them from observed throttling.

If you’ve implemented prefix sharding correctly and are still seeing performance issues, the next problem you’ll likely encounter is S3 request throttling errors, often manifesting as SlowDown or 503 Service Unavailable responses.