Rate limiting is often misunderstood as a simple "too many requests" guardrail, but its real power lies in its ability to starve out attackers by making their brute-force attempts prohibitively slow.

Let’s see it in action with a common Nginx setup for IP-based rate limiting. Imagine we have a login endpoint /login that we want to protect.

http {
    limit_req_zone $binary_remote_addr zone=mylimit:10m rate=5r/s; # 5 requests per second per IP

    server {
        listen 80;
        server_name example.com;

        location /login {
            limit_req zone=mylimit burst=10 nodelay; # Allow a burst of 10, then enforce 5r/s
            proxy_pass http://backend_login_service;
        }

        location / {
            proxy_pass http://frontend_service;
        }
    }
}

In this configuration:

  • limit_req_zone $binary_remote_addr zone=mylimit:10m rate=5r/s; defines a zone named mylimit. It uses the client’s IP address ($binary_remote_addr) as the key. 10m is the shared memory size (10 megabytes), large enough to hold state for many IPs. rate=5r/s sets the core rate limit: 5 requests per second.
  • location /login { ... } applies this zone to the /login endpoint.
  • limit_req zone=mylimit burst=10 nodelay; tells Nginx to use the mylimit zone. burst=10 allows up to 10 requests to be processed immediately if the rate limit has been idle. nodelay means that if the burst is exceeded, requests are immediately rejected with a 503 Service Temporarily Unavailable error, rather than being delayed. Without nodelay, Nginx would try to queue requests up to the burst limit and then delay subsequent ones.

This setup effectively makes it impossible for an attacker to try thousands of password combinations per second from a single IP. They’d be throttled to just 5 attempts per second. If they try to burst through, they’ll hit the burst=10 limit and then get immediately rejected.

The problem rate limiting solves is the asymmetry of cost. For an attacker, the cost of one login attempt is negligible. For the defender, each failed attempt consumes server resources (CPU, memory, database lookups). Rate limiting shifts this cost dramatically, making brute-force attacks economically unfeasible for the attacker.

Internally, Nginx uses a leaky bucket algorithm (or a variation of it) for limit_req. Each IP address is a bucket with a certain capacity (the burst). Requests fill the bucket. If the bucket is full, requests are rejected. The rate dictates how fast the bucket "leaks" or empties, allowing new requests to be accepted. The nodelay option bypasses any queuing and immediately rejects requests when the bucket is full.

The zone directive is crucial; it defines the shared memory segment where Nginx stores the state for each key (in this case, each IP address). This allows multiple worker processes to share the rate-limiting state, ensuring consistent enforcement across the server.

While IP-based rate limiting is a strong first line of defense, attackers can circumvent it by distributing their attacks across many IP addresses (a distributed brute-force attack). This is where account-level rate limiting becomes essential. Instead of tracking by IP, you track by the username or account ID being targeted.

Here’s a conceptual example of how you might implement account-level rate limiting, often done in your application code or an API gateway.

# Example using Flask and a simple in-memory cache for demonstration
from flask import Flask, request, jsonify
from collections import defaultdict
import time

app = Flask(__name__)

# In-memory store for rate limiting data: {account_id: {timestamp: count}}
# In a real app, use Redis or Memcached for persistence and scalability
account_attempts = defaultdict(lambda: defaultdict(int))
RATE_LIMIT_PER_ACCOUNT = 10  # Max attempts per account per minute
TIME_WINDOW_SECONDS = 60

@app.route('/login', methods=['POST'])
def login():
    data = request.get_json()
    username = data.get('username')
    password = data.get('password')

    current_time = time.time()
    current_minute_start = int(current_time / TIME_WINDOW_SECONDS) * TIME_WINDOW_SECONDS

    # Clean up old entries outside the current window
    if username in account_attempts:
        old_timestamps = [ts for ts in account_attempts[username] if ts < current_minute_start]
        for ts in old_timestamps:
            del account_attempts[username][ts]
        if not account_attempts[username]:
            del account_attempts[username]

    # Check current rate limit
    attempts_in_window = sum(account_attempts[username].values())

    if attempts_in_window >= RATE_LIMIT_PER_ACCOUNT:
        return jsonify({"message": "Too many login attempts for this account. Please try again later."}), 429 # Too Many Requests

    # Record the attempt
    account_attempts[username][current_minute_start] += 1

    # --- Actual login logic here ---
    # For demonstration, we'll just check if username and password are provided
    if username and password:
        # Simulate successful login or failure based on actual credentials
        if username == "admin" and password == "password123":
            return jsonify({"message": "Login successful!"}), 200
        else:
            return jsonify({"message": "Invalid credentials."}), 401
    else:
        return jsonify({"message": "Username and password are required."}), 400

if __name__ == '__main__':
    app.run(debug=True)

In this Python example, we’re tracking login attempts per username. The account_attempts dictionary stores timestamps for each minute and the number of attempts within that minute. Before processing a login, it checks if the total attempts for the username within the last TIME_WINDOW_SECONDS exceed RATE_LIMIT_PER_ACCOUNT. If so, it returns a 429 Too Many Requests.

This approach directly punishes attackers trying to brute-force a specific account, even if they are using a botnet with thousands of IPs. Each IP hitting the login endpoint for the same target username will contribute to that username’s attempt count.

The most powerful aspect of account-level rate limiting is its ability to detect and block credential stuffing attacks. Attackers often use lists of leaked username/password pairs. By rate-limiting per account, you can effectively slow down or stop these automated attacks that rely on rapid, widespread attempts against known credentials.

What most people miss about rate limiting is its ability to be combined with other signals. For instance, you might have a global IP-based limit, a per-account limit, and then a separate, much stricter limit for a specific "forgot password" endpoint that only allows one request per account per hour, regardless of IP. The real strength comes from layering these policies.

The next logical step after implementing robust rate limiting is to consider how to handle compromised accounts, such as requiring a password reset or multi-factor authentication.

Want structured learning?

Take the full Rate-limiting course →