Distributed Rate Limiting with Redis: Consistent at Scale (2026)

Distributed rate limiting is surprisingly difficult to get exactly right, especially when you need it to be consistent across multiple service instances.

Let’s say you’re building a public API. You want to ensure no single user floods your service, but you also want to avoid accidentally blocking legitimate traffic. A common approach is to use Redis to store request counts.

Here’s a simplified look at how it might work in practice. Imagine a user making requests. We’ll track their requests per minute.

import redis
import time

r = redis.Redis(host='localhost', port=6379, db=0)
user_id = "user123"
limit = 10  # requests per minute
window_seconds = 60

def is_rate_limited(user_id):
    now = int(time.time())
    key = f"rate_limit:{user_id}"

    # Use a Redis pipeline for atomicity
    pipe = r.pipeline()
    pipe.zadd(key, {str(now): now}) # Add current request timestamp
    pipe.zremrangebyscore(key, 0, now - window_seconds) # Remove old requests
    pipe.zcard(key) # Count current requests
    results = pipe.execute()

    current_count = results[2]

    if current_count > limit:
        return True
    else:
        return False

if __name__ == "__main__":
    for i in range(15):
        if is_rate_limited(user_id):
            print(f"Request {i+1}: Rate limited!")
        else:
            print(f"Request {i+1}: Allowed.")
        time.sleep(2) # Simulate requests coming in

This code uses Redis’s Sorted Sets (ZSETs) to store timestamps of requests. When a new request comes in, we add its timestamp. Then, we prune any timestamps older than our defined window (e.g., 60 seconds). Finally, we count the remaining timestamps. If the count exceeds our limit, the request is denied. The pipeline ensures that adding the new timestamp, removing old ones, and counting are treated as a single, atomic operation, preventing race conditions between multiple requests hitting the same user’s key.

The core problem this solves is managing shared state (request counts) across potentially many instances of your application. Without a central, fast store like Redis, each instance would only know about its own requests, making distributed rate limiting impossible. Redis provides that single source of truth. You control the limit (how many requests are allowed) and the window_seconds (the duration over which the limit is enforced). The user_id is how you partition the limits, allowing different users or API keys to have their own quotas.

A common pitfall is how Redis handles expiring keys. If you simply set a TTL on the key storing the request counts, Redis might evict the key before you’ve had a chance to prune old entries, leading to an inaccurate count. The Sorted Set approach, combined with zremrangebyscore, actively manages the contents of the set, ensuring only relevant timestamps remain, independent of the key’s overall TTL. This is crucial for maintaining accurate counts within the sliding window.

The most surprising thing about this approach is how crucial the zremrangebyscore operation is. It’s not just about adding new items; it’s about actively cleaning up the history to maintain the sliding window’s integrity. Without this cleanup, the sorted set would grow indefinitely, consuming memory and making zcard operations slow. This proactive removal, tied directly to the request timestamps and the window size, is what makes the rate limiting truly "sliding" and consistent.

If you start seeing redis.exceptions.ConnectionError after implementing this, you’re likely hitting network issues or an overloaded Redis instance.