Rate Limiting Burst Allowance: Handle Traffic Spikes (2026)

A burst allowance in rate limiting isn’t about how much traffic you can send, but how much you can send right now after a period of inactivity.

Let’s say you have a rate limit of 100 requests per minute. Without a burst allowance, if you send 100 requests in the first second, you’d be blocked for the rest of that minute. But with a burst allowance, you might be able to send, say, 200 requests in that first second, as long as you average out to 100 requests per minute over the entire minute. This allows for those natural "spikes" in user activity without immediately hitting a wall.

Here’s a simplified Nginx configuration that demonstrates this. We’ll limit to 10 requests per minute, with a burst allowance of 20.

http {
    limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/m;

    server {
        listen 80;

        location / {
            limit_req zone=mylimit burst=20 nodelay;
            proxy_pass http://backend_server;
        }
    }
}

In this setup:

limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/m; defines the zone where we track requests by IP address ($binary_remote_addr). The zone mylimit has a size of 10MB (enough to store state for many IPs) and the average rate is set to 10 requests per minute (10r/m).
limit_req zone=mylimit burst=20 nodelay; applies this zone to our / location.
- burst=20 is the key. It means we can accumulate up to 20 requests beyond the average rate before we start enforcing it strictly.
- nodelay is important here. It means that if the burst allowance is available, requests are allowed through immediately, even if they exceed the average rate. If nodelay were absent, requests exceeding the average rate would be delayed until they fall within the average, which defeats the purpose of handling bursts.

Imagine a user suddenly hitting your API. Before this user has made any requests for a while, their "burst allowance" is full. If the limit is 10r/m and burst is 20, they could, in theory, send 20 requests in rapid succession, and they would all be accepted immediately. After those 20 requests, their allowance is depleted. The next request would be subject to the average rate. If they continue sending requests, they’d only be allowed 10 requests per minute from that point on until their allowance refills (which happens gradually as requests are processed).

The zone=mylimit:10m part is about memory. Each entry in the zone (representing an IP address) needs some memory to store its current request count and the timestamp of its last request. 10MB is a typical starting point, but you might need to adjust it based on the expected number of unique IP addresses hitting your server. The rate=10r/m is the average rate. It’s the long-term average we’re aiming for.

The actual mechanism behind burst and nodelay is a token bucket algorithm. Think of it as a bucket that holds "tokens." Each token represents permission to make one request.

The bucket has a maximum capacity, which is your burst value.
Tokens are added to the bucket at a steady rate, defined by your rate.
When a request comes in, the system tries to take a token from the bucket.
If a token is available, the request is allowed immediately, and a token is removed.
If the bucket is empty, the request is either rejected (if nodelay is not used and we don’t want to delay) or delayed until a token becomes available.
The nodelay option, when combined with burst, means that if the bucket has any tokens (i.e., the burst allowance is not fully depleted), the request is served immediately, and a token is consumed. If the bucket is empty, then the request is rejected. This allows for rapid bursts of requests as long as the bucket isn’t empty.

The allowance refills over time. If your rate is 10r/m, that’s roughly one request every 6 seconds. So, if you use up your burst of 20, it will take about 20 * 6 = 120 seconds (2 minutes) for the allowance to fully refill, assuming no new requests arrive.

One aspect that often trips people up is how the "average rate" and "burst" interact with nodelay. If you set nodelay, Nginx effectively ignores the average rate until the burst allowance is exhausted. So, with rate=10r/m and burst=20 nodelay, you can send 20 requests in an instant. After the 20th request, the allowance is gone. The next request will then be checked against the rate=10r/m constraint. If it’s within the average rate, it’s allowed. If not, it’s rejected. This means for a short period, you can exceed the average rate significantly, but your long-term average will still be regulated.

If you were to remove nodelay, Nginx would try to delay requests that exceed the average rate until they fall within it, rather than rejecting them. This can lead to request queuing and increased latency, which is usually not what you want when handling traffic spikes.

The next thing you’ll likely encounter is configuring different rate limits for different endpoints or user groups, or handling the case where burst allowance is not enough and requests are still being dropped.