Rate limiting is often seen as a crude gatekeeper, but its real power lies in its ability to sculpt traffic into predictable patterns, even under duress.
Let’s watch nginx do its thing. Imagine we have a simple API endpoint, /data, that we want to throttle to 10 requests per minute per IP address.
http {
limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/m;
server {
listen 80;
server_name example.com;
location /data {
limit_req zone=mylimit burst=20 nodelay;
proxy_pass http://backend_server;
}
location / {
# ... other configurations ...
}
}
# ... upstream configurations ...
}
Here, $binary_remote_addr creates a unique zone for each client IP. zone=mylimit:10m allocates 10 megabytes of memory for this zone, which is enough to store state for a significant number of IPs (around 160,000 per MB). rate=10r/m sets the target rate to 10 requests per minute.
The limit_req zone=mylimit burst=20 nodelay; directive within the /data location is where the magic happens. burst=20 allows a temporary surge of up to 20 requests to be accepted immediately, even if the rate limit is 10 per minute. The nodelay option means that if the burst capacity is exceeded, requests are immediately rejected (with a 429 Too Many Requests response) rather than being delayed. Without nodelay, nginx would queue excess requests and serve them later, effectively smoothing out the traffic after the limit is hit, which is often not the desired behavior for APIs.
To test this, we can use curl and ab (ApacheBench).
First, let’s hit the endpoint rapidly with curl to see the burst in action:
for i in {1..25}; do curl http://example.com/data & sleep 0.1; done
You’ll see the first 20 requests get through quickly. The next 5 might get rejected if the interval between them is too small.
Now, let’s simulate sustained load with ab to observe the rate limiting:
ab -n 100 -c 5 http://example.com/data
This command sends 100 requests with 5 concurrent users. If your rate limit is 10 requests per minute, you’ll see ab reporting a significant number of failed requests (likely 429s) as it tries to exceed that threshold. The average time per request will also start to climb as nginx begins to reject traffic.
The limit_req_zone directive defines the state of the rate limiting. The limit_req directive applies that state to a specific location. The zone parameter links to the defined zone. The burst parameter is crucial: it’s the maximum number of requests that can be processed in excess of the defined rate, allowing for short spikes without immediate rejection. nodelay ensures that any requests exceeding the burst capacity are immediately denied.
The system nginx uses for rate limiting is based on a token bucket algorithm. Each request consumes a "token." Tokens are replenished at a fixed rate (rate=10r/m). The burst value dictates the maximum number of tokens that can be held in the bucket. If a request arrives and there are no tokens available (and the nodelay option is set), it’s rejected. This allows for short bursts of traffic to be handled gracefully, as long as the average rate over time stays within the defined limit.
What most people miss is how the zone size interacts with rate. The 10m isn’t just a buffer for requests; it’s the memory allocated to store the state (last access time, current token count) for each unique $binary_remote_addr. If you have millions of unique IPs hitting your service, and you’ve only allocated a small zone, you’ll start seeing nginx evicting older IP states to make room for new ones, even if those IPs might have been rate-limited previously. This can lead to unexpected re-allowance of traffic from IPs that should still be throttled. The formula often cited is 1MB holds about 160,000 IP addresses. If you have more unique IPs than your zone can hold, you’ll start to see unpredictable behavior.
The next hurdle is understanding how to distribute this rate limiting across multiple nginx instances or behind a load balancer.