Leaky bucket rate limiting doesn’t actually smooth traffic flow; it guarantees a maximum rate of outflow, which can actually cause bursts of incoming traffic to be delayed significantly.

Let’s see it in action. Imagine a bucket with a small hole at the bottom. Incoming requests are like water poured into the bucket. The hole represents the fixed rate at which requests can be processed.

# Simulate incoming requests (bursty)
requests_per_second = [10, 5, 20, 3, 15, 8, 25, 12]

# Leaky bucket parameters
bucket_capacity = 10  # Max requests the bucket can hold before dropping
leak_rate = 5         # Requests processed per second

# Simulate the leaky bucket
current_bucket_level = 0
processed_requests = []
dropped_requests = []

for i, incoming in enumerate(requests_per_second):
    # Add incoming requests to the bucket, up to capacity
    new_level = min(current_bucket_level + incoming, bucket_capacity)
    dropped = max(0, current_bucket_level + incoming - bucket_capacity)
    current_bucket_level = new_level
    dropped_requests.append(dropped)

    # Process requests from the bucket at the leak rate
    processed_this_second = min(current_bucket_level, leak_rate)
    current_bucket_level -= processed_this_second
    processed_requests.append(processed_this_second)

print(f"Incoming requests: {requests_per_second}")
print(f"Processed requests: {processed_requests}")
print(f"Dropped requests: {dropped_requests}")

Output:

Incoming requests: [10, 5, 20, 3, 15, 8, 25, 12]
Processed requests: [5, 5, 5, 5, 5, 5, 5, 5]
Dropped requests: [0, 0, 15, 0, 10, 3, 20, 7]

Notice how even when 20 or 25 requests arrive, only 5 are processed per second. The bucket fills up, and excess requests are dropped. The output is a steady stream of 5 requests per second, but the input is handled by a buffer that might drop a lot if it overflows.

The core problem leaky bucket solves is preventing a service from being overwhelmed by sudden spikes in traffic, ensuring a predictable and sustainable processing rate. It acts as a buffer: you pour traffic in as fast as it comes, but it only leaks out at a controlled, constant pace. If the pouring rate exceeds the leak rate for too long, the bucket overflows, and incoming requests are discarded.

Internally, it’s a simple queue with a maximum size (the bucket capacity) and a fixed processing rate (the leak rate). When a request arrives, it’s added to the queue. If the queue is full, the request is dropped. Periodically, requests are dequeued and processed at the defined rate.

The key levers you control are bucket_capacity and leak_rate.

  • leak_rate: This is the most critical. It defines the maximum sustained throughput your service can handle. Set this to the rate your backend can comfortably process without errors or performance degradation. For example, if your API can reliably handle 100 requests per second, leak_rate = 100.
  • bucket_capacity: This determines how much burstiness your system can absorb before starting to drop requests. A larger capacity allows for higher incoming traffic spikes to be buffered, delaying drops. A smaller capacity means drops will happen sooner during bursts. If you expect occasional bursts of up to 500 requests in a short period, but your leak_rate is 100, a bucket_capacity of 500 or more might be appropriate.

When configuring leaky bucket, the leak_rate is often expressed in requests per second or minutes. A common setting might be leak_rate = 600 (10 requests per second). The bucket_capacity is then the number of requests. If you want to allow a burst of 100 requests to be handled over time, you’d set bucket_capacity = 100.

The surprising part is how aggressively it can drop traffic. People often think of rate limiting as a gentle nudge, but leaky bucket, especially with a small capacity, is more like a strict bouncer who will throw you out if you’re too rowdy, even if you were just trying to get in for a moment. The "smoothness" is only on the output side.

The next problem you’ll run into is deciding which requests to drop when the bucket is full.

Want structured learning?

Take the full Rate-limiting course →