The most surprising thing about sliding window log rate limiting is that it doesn’t actually count requests; it counts time spent and deduces requests from that.
Let’s see it in action. Imagine you have a web API endpoint, /api/v1/users. You want to limit clients to 100 requests per minute.
Here’s a simplified view of what happens when a client, say 192.168.1.10, hits your API:
- Request Arrives:
GET /api/v1/usersfrom192.168.1.10. - Timestamp Recorded: The system notes the exact time this request arrived. Let’s say it’s
1678886400.123. - Window Check: The system looks at its records for
192.168.1.10. It maintains a log of request timestamps within the current "window." - Expiry: It removes any timestamps from the log that are older than one minute ago (i.e., older than
1678886400.123 - 60 seconds). - Count: It counts how many timestamps remain in the log.
- Decision: If the count is less than 100, the request is allowed. The new timestamp (
1678886400.123) is added to the log. If the count is already 100 or more, the request is rejected with a429 Too Many Requestsstatus.
This continues for every request. The "window" is not a fixed, discrete block of time like "every hour on the hour." Instead, it’s a rolling 60-second period ending at the current moment.
The Problem It Solves: Spikes vs. Average
Traditional fixed-window rate limiting (e.g., 100 requests per calendar minute) has a major flaw. If a user sends 100 requests at 00:00:59 and another 100 requests at 00:01:01, they’ve sent 200 requests in 2 seconds, but they’d be allowed because they fell into two different fixed windows. This is a "burst" that can overwhelm your service.
Sliding window log rate limiting solves this by always looking back exactly 60 seconds from the current request. This smooths out the rate, preventing massive bursts within any given minute.
Internal Mechanics: The Timestamp Log
At its core, this mechanism relies on a data structure that efficiently stores and queries timestamps. For a single client, it might look something like this in memory (simplified):
{
"192.168.1.10": {
"timestamps": [
1678886400.100, // Request 1
1678886400.123, // Request 2
1678886401.500, // Request 3
// ... up to 100 timestamps
],
"limit": 100,
"window_seconds": 60
}
}
When a new request comes in at 1678886400.700:
- Expiry: The system checks
timestampsfor any values less than1678886400.700 - 60 = 1678886340.700. If1678886339.900was the oldest timestamp, it’s removed. - Count: It counts the remaining timestamps. If there are 99, the new request is allowed, and
1678886400.700is added. If there are 100, the request is rejected.
Configuration Levers
You control two primary parameters:
limit: The maximum number of requests allowed within the window. For example,100.window_seconds: The duration of the sliding window. For example,60for one minute.
You might also configure:
key_extractor: How to identify the client. This could be an IP address, an API key from a header (X-API-Key), a user ID from a JWT token, etc.storage_backend: Where to store the timestamp logs. For high-throughput systems, this is typically an in-memory data store like Redis, which offers O(1) average time complexity for adding and removing elements, and efficient range queries.
This approach provides a much more robust defense against sudden traffic spikes than fixed-window methods.
The actual implementation often uses a data structure like a Redis sorted set. When a request comes in, you add the current timestamp as a score for the client’s key. Then, you use ZREMRANGEBYSCORE to remove all timestamps older than current_time - window_seconds. Finally, you use ZCARD to get the count of remaining timestamps. If ZCARD is less than your limit, you allow the request; otherwise, you reject it.
The next challenge you’ll face is managing different rate limits for different endpoints or user tiers.