Rate limiting by IP address is often the first line of defense against brute-force attacks and scrapers, but it’s surprisingly easy to get wrong, leading to legitimate users being blocked or attackers slipping through.

Let’s see it in action. Imagine a basic Nginx configuration to limit requests to 100 per minute per IP:

http {
    limit_req_zone $binary_remote_addr zone=mylimit:10m rate=100r/m;

    server {
        listen 80;
        location / {
            limit_req zone=mylimit burst=200 nodelay;
            proxy_pass http://backend;
        }
    }
}

Here, $binary_remote_addr is the key. It’s a compact binary representation of the client’s IP address, making the lookup much faster and more memory-efficient than a string representation. The zone=mylimit:10m part allocates 10 megabytes of shared memory for tracking IPs. rate=100r/m sets the target rate to 100 requests per minute.

In the location block, limit_req zone=mylimit applies the zone we defined. burst=200 allows for a temporary spike of up to 200 requests exceeding the rate before throttling kicks in. nodelay means that if the burst limit is exceeded, requests are immediately rejected (returning a 503 Service Unavailable) rather than being queued.

Now, consider what happens when a user is behind a proxy or load balancer. Nginx, by default, sees the IP address of the load balancer, not the actual client. If multiple users share the same load balancer IP, they all get lumped together under that single IP’s rate limit. This is where per-IP rate limiting breaks down.

To fix this, you need to configure Nginx to trust the X-Forwarded-For header (or a similar header like X-Real-IP) that your proxy or load balancer adds. This header contains the original client IP address.

http {
    # ... other configurations ...
    set_real_ip_from 192.168.1.0/24; # Trust IPs from your internal network/load balancers
    set_real_ip_from 10.0.0.0/8;
    real_ip_header X-Forwarded-For; # Use X-Forwarded-For to determine the client IP

    limit_req_zone $binary_remote_addr zone=mylimit:10m rate=100r/m;

    server {
        listen 80;
        location / {
            limit_req zone=mylimit burst=200 nodelay;
            proxy_pass http://backend;
        }
    }
}

By adding set_real_ip_from and real_ip_header, Nginx will now correctly extract the client’s IP from the header, ensuring that rate limiting is applied to individual users, not just the shared proxy IP. The set_real_ip_from directive is crucial for security; it tells Nginx which IP addresses it should trust to provide the real_ip_header. You should only trust IPs that are under your control, like your load balancers or internal network ranges.

The burst parameter is often misunderstood. It’s not a buffer for future requests; it’s a grace period for sudden spikes. If your rate is 100 requests/minute, and a user sends 101 requests in the first second, they’ll likely be throttled if burst is too small. Setting burst to a value that can accommodate typical flash traffic, like 200 or 300, can prevent legitimate users from being unfairly penalized during brief traffic surges.

The nodelay option is aggressive. If you want to avoid immediate 503s and instead queue requests up to the burst limit, you’d remove nodelay. Nginx would then hold onto requests that exceed the rate and serve them as soon as capacity allows, up to the burst limit. This can smooth out traffic but might increase latency for users during peak times.

A common pitfall is assuming that a simple rate=X/m is sufficient. The actual request rate isn’t a smooth, continuous flow. Users might send requests in bursts. A more nuanced approach is to use a smaller time unit for the rate, like rate=10r/s (10 requests per second), which is equivalent to 600 requests per minute but reacts much faster to sudden spikes. Combine this with a well-tuned burst value.

The limit_req_status 429; directive allows you to change the HTTP status code returned when a request is denied. While 503 Service Unavailable is the default and often appropriate, 429 Too Many Requests is more semantically correct for rate limiting and can be better for clients to interpret.

http {
    # ...
    limit_req_zone $binary_remote_addr zone=mylimit:10m rate=60r/m; # 60 requests per minute

    server {
        listen 80;
        location / {
            limit_req zone=mylimit burst=120 nodelay; # Allow a spike up to 120
            limit_req_status 429; # Return 429 instead of 503
            proxy_pass http://backend;
        }
    }
}

This configuration limits each unique IP to 60 requests per minute, allows for an immediate burst of 120 requests, rejects requests beyond that burst without delay, and returns a 429 status code.

When you implement per-IP rate limiting, you’re essentially building a gatekeeper. The real magic is in how you define "per IP." If you’re not correctly handling proxied requests, your gatekeeper is only seeing the doorman’s ID, not the actual guests.

The next problem you’ll likely encounter is differentiating between legitimate API clients that might share IPs and malicious actors, often requiring more granular limits based on API keys or user sessions.

Want structured learning?

Take the full Rate-limiting course →