Per-API-Key Rate Limiting: Isolate and Protect Tenants (2026)

Tenant isolation is often achieved by implementing per-API-key rate limiting, which prevents a single tenant from monopolizing resources and impacting others.

Let’s see this in action with a hypothetical API Gateway configuration. Imagine we have a simple API that allows users to fetch product information. Without rate limiting, a single user making an excessive number of requests could slow down or even crash the service for everyone.

# Example API Gateway Configuration Snippet
rateLimiting:
  rules:
    - selector:
        match:
          api: /products
      rate: 100 # requests per minute
      period: 1m
      key:
        from: header
        name: X-API-Key

In this configuration, we’ve defined a rule that applies to requests made to the /products API. The rate is set to 100 requests within a period of 1m (one minute). Crucially, the key is extracted from the header named X-API-Key. This means each unique X-API-Key will have its own independent limit of 100 requests per minute. If Tenant A uses key-abc and Tenant B uses key-xyz, Tenant A can make 100 requests, and Tenant B can also make 100 requests, without affecting each other.

The problem this solves is resource contention and the "noisy neighbor" effect. In multi-tenant systems, where multiple customers share the same underlying infrastructure, one tenant’s excessive usage can degrade performance for all other tenants. Per-API-key rate limiting acts as a traffic cop, ensuring fair usage and preventing any single tenant from overwhelming the system.

Internally, the API Gateway (or a dedicated rate-limiting service) maintains a counter for each unique API key. When a request arrives, the gateway extracts the API key from the specified source (e.g., a header). It then checks the current count for that key against the configured limit. If the limit hasn’t been reached, the request is allowed, and the counter is incremented. If the limit has been reached, the request is rejected, typically with a 429 Too Many Requests HTTP status code. The counters are reset periodically based on the period configuration.

The exact levers you control are the rate, period, and the key extraction mechanism. You can define different rules for different APIs or different sets of API keys. For instance, you might have a higher rate limit for premium tenants or for specific, less resource-intensive endpoints. You could also base the key on a query parameter or a JWT claim, depending on how your authentication and authorization are structured.

The surprising thing about per-API-key rate limiting is how granularly it allows you to manage not just total traffic, but individual tenant traffic. It’s not just about protecting your backend from being overloaded; it’s about architecting an equitable experience for all your customers, ensuring that your service quality remains consistent regardless of the actions of any single user. You can even use it to implement tiered service levels, where higher-paying customers get higher rate limits associated with their keys.

The next concept you’ll likely encounter is implementing dynamic rate limiting based on request payload or response codes.