The most surprising thing about sliding window rate limiting is that it doesn’t actually track individual requests.
Let’s say you want to limit a user to 10 requests per minute. With a traditional fixed-window counter, if the user makes 10 requests at 0:59 and then another 10 requests at 1:00, they’ve effectively made 20 requests within a single minute, blowing past your limit.
The sliding window approach solves this by using a time-based window that "slides" forward. Instead of discrete, fixed buckets, it’s more like a continuous timeline. When a request comes in, we look at all requests that have happened in the last minute from this exact moment.
Here’s a simplified view of how it works with Redis, a common choice for this pattern. We use a sorted set (ZSET) where the score is the timestamp of the request and the value is arbitrary (e.g., the request ID or just a placeholder).
# User ID and the current timestamp (e.g., Unix epoch in milliseconds)
USER_ID="user123"
CURRENT_TIMESTAMP=$(date +%s%3N)
WINDOW_SIZE_MS=60000 # 60 seconds
# 1. Remove old entries from the sorted set
# This command removes all members whose score is less than (CURRENT_TIMESTAMP - WINDOW_SIZE_MS)
redis-cli ZREMRANGEBYSCORE rate_limit:${USER_ID} 0 $((${CURRENT_TIMESTAMP} - ${WINDOW_SIZE_MS}))
# 2. Add the new request to the sorted set
# We use the current timestamp as both the score and the member for simplicity here.
redis-cli ZADD rate_limit:${USER_ID} ${CURRENT_TIMESTAMP} ${CURRENT_TIMESTAMP}
# 3. Count the remaining entries
# This gives us the number of requests in the last minute.
REQUEST_COUNT=$(redis-cli ZCARD rate_limit:${USER_ID})
# 4. Check against the limit
MAX_REQUESTS=10
if [ "$REQUEST_COUNT" -gt "$MAX_REQUESTS" ]; then
echo "Rate limit exceeded for ${USER_ID}. Count: ${REQUEST_COUNT}"
# Return 429 Too Many Requests
else
echo "Request allowed for ${USER_ID}. Count: ${REQUEST_COUNT}"
# Process the request
fi
The key here is ZREMRANGEBYSCORE. It efficiently prunes any timestamps that fall outside our defined sliding window. Then, ZADD adds the current request, and ZCARD gives us the precise count of requests within that dynamic window.
This approach ensures that at any given point, we’re only counting requests that actually occurred within the last 60 seconds, providing a much more accurate representation of usage than fixed windows. It avoids the "burst" problem where a user could hit a rate limit boundary and immediately make a burst of requests.
The actual implementation often involves a bit more: using a Redis Lua script for atomicity (to ensure steps 1-3 happen as a single operation, preventing race conditions), handling distributed systems where requests might hit different Redis instances, and choosing appropriate data types if Redis isn’t the backend (like in-memory stores for single-server apps, or more complex distributed caches). The core logic, however, remains the same: prune old, add new, count.
What most people don’t realize is that the ZADD command in Redis can actually return the number of new elements added. If you combine that with the count before adding, you can calculate the new total. However, the ZREMRANGEBYSCORE and then ZCARD approach is often clearer and less prone to off-by-one errors when dealing with concurrent requests. The critical part is that the removal and addition must be atomic relative to the count.
The next challenge you’ll face is efficiently handling extremely high throughput, where even Redis might become a bottleneck, and you’ll start looking at techniques like probabilistic data structures or custom kernel-level rate limiting.