Redis Rate Limiting: Token Bucket and Sliding Window (2026)

Redis is surprisingly bad at rate limiting, and the best approaches use it as a backend for a more robust system.

Let’s say you’re building an API and you want to prevent abuse. You decide to limit users to 100 requests per minute. A common first thought is to use Redis to store a counter for each user and increment it on every request. If the counter exceeds 100 within a minute, you reject the request.

Here’s how that might look in pseudocode:

import redis
import time

r = redis.Redis(host='localhost', port=6379, db=0)

def rate_limit(user_id, limit=100, period=60):
    key = f"rate_limit:{user_id}"
    current_count = r.incr(key)
    if current_count == 1:
        r.expire(key, period)
    if current_count > limit:
        return False # Too many requests
    return True

# Example usage
user = "user123"
if rate_limit(user):
    print("Request allowed")
else:
    print("Rate limit exceeded")

This seems simple, but it has a major flaw: it’s not accurate. The EXPIRE command sets a timeout on the key after it’s created. If a burst of requests comes in right at the end of a minute, and the EXPIRE is set for 60 seconds, those requests might all be counted towards the next minute’s limit before the counter resets. This is often called the "bursty" problem.

To address this, we can move to more sophisticated algorithms.

Token Bucket

The Token Bucket algorithm is a classic for rate limiting. Imagine a bucket that holds tokens, and you can only make a request if there’s a token available. Tokens are added to the bucket at a steady rate.

Here’s how you’d implement it with Redis:

Store Token Count: A Redis key stores the current number of tokens.
Store Last Refill Time: Another key stores the timestamp of the last time tokens were added.
Refill Logic: When a request comes in, calculate how many tokens should have been added since the last refill, add them (up to a maximum capacity), and update the last refill time.
Consume Token: If tokens are available, decrement the token count and allow the request. Otherwise, reject it.

Let’s look at the Redis commands involved. We’ll use HSET to store multiple fields for a user’s rate limit state.

import redis
import time

r = redis.Redis(host='localhost', port=6379, db=0)

def token_bucket_rate_limit(user_id, capacity=100, refill_rate=10, period=60):
    key = f"token_bucket:{user_id}"
    now = time.time()

    # Get current state or initialize if it doesn't exist
    state = r.hgetall(key)
    if not state:
        tokens = capacity
        last_refill = now
        r.hset(key, mapping={
            "tokens": tokens,
            "last_refill": last_refill
        })
        r.expire(key, period * 2) # Keep the key around for a while
    else:
        tokens = float(state[b"tokens"])
        last_refill = float(state[b"last_refill"])

        # Calculate elapsed time and refill tokens
        elapsed_time = now - last_refill
        tokens_to_add = elapsed_time * refill_rate
        new_tokens = min(capacity, tokens + tokens_to_add)

        # Update state
        r.hset(key, mapping={
            "tokens": new_tokens,
            "last_refill": now
        })

        # Consume token if available
        if new_tokens >= 1:
            r.hset(key, "tokens", new_tokens - 1)
            return True # Request allowed
        else:
            return False # Rate limit exceeded

# Example usage
user = "user456"
if token_bucket_rate_limit(user):
    print("Request allowed")
else:
    print("Rate limit exceeded")

The capacity is the maximum number of tokens the bucket can hold, refill_rate is how many tokens are added per second, and period is how long the rate limit is enforced (which we use for the EXPIRE to clean up old keys). This approach smooths out bursts and provides a more consistent rate. The EXPIRE here is less critical for accuracy and more for garbage collection of old user data.

Sliding Window

The Sliding Window algorithm is another popular method. Instead of a fixed window (like "per minute"), it uses a "sliding" window of a fixed duration. For example, if the limit is 100 requests per minute, it tracks requests within the last 60 seconds.

Redis can implement this using sorted sets (ZSET). Each request is added to the sorted set with its timestamp as the score.

Add Request: On each request, add the current timestamp to a sorted set associated with the user.
Remove Old Requests: Remove all entries from the sorted set whose timestamps are older than the window duration (e.g., older than 60 seconds ago).
Count Remaining: The number of remaining elements in the sorted set is the count of requests within the sliding window.
Enforce Limit: If the count exceeds the limit, reject the request.

Here’s the Redis commands for this:

import redis
import time

r = redis.Redis(host='localhost', port=6379, db=0)

def sliding_window_rate_limit(user_id, limit=100, window_seconds=60):
    key = f"sliding_window:{user_id}"
    now = time.time()
    window_start = now - window_seconds

    # Use a pipeline for atomicity and efficiency
    pipe = r.pipeline()

    # 1. Add the current request timestamp (score is timestamp, member is timestamp for uniqueness)
    pipe.zadd(key, {f"{now}": now})

    # 2. Remove all requests older than the window
    pipe.zremrangebyscore(key, 0, window_start)

    # 3. Count the remaining requests
    pipe.zcard(key)

    # 4. Set an expire on the key to clean up old data
    pipe.expire(key, window_seconds * 2) # Keep for twice the window duration

    results = pipe.execute()

    # The count is the third result from the pipeline (index 2)
    current_count = results[2]

    if current_count > limit:
        return False # Rate limit exceeded
    return True # Request allowed

# Example usage
user = "user789"
if sliding_window_rate_limit(user):
    print("Request allowed")
else:
    print("Rate limit exceeded")

The key here is using ZADD to add the current request’s timestamp, ZREMRANGEBYSCORE to clean up old entries efficiently, and ZCARD to get the count. The entire operation is wrapped in a PIPELINE to ensure atomicity – if one part fails, the whole transaction is rolled back, and importantly, it prevents race conditions where a request might be counted twice or missed entirely. The EXPIRE is for cleanup.

The most surprising thing about these Redis-based rate limiting strategies is how much state and computation they require client-side or in application code to achieve accuracy.

When you’re implementing sliding window with sorted sets, remember that the member value in ZADD needs to be unique. Using the timestamp itself as the member works well, but if you have extremely high-frequency requests within the same millisecond, you might need to append a unique ID or counter to the timestamp to ensure uniqueness.

The next problem you’ll likely run into is managing rate limits across multiple Redis instances or dealing with distributed systems where a single Redis instance is a single point of failure.