Rate Limiting in Azure API Management: Policy Config (2026)

Azure API Management’s rate limiting is a crucial mechanism for protecting your backend services from being overwhelmed by excessive requests. It allows you to define policies that restrict the number of calls a consumer can make within a specified time period.

Let’s see rate limiting in action. Imagine you have a getProducts operation in your API, and you want to limit consumers to 100 requests per minute. Here’s how you’d configure that in the inbound policy section of your API Management service:

<policies>
    <inbound>
        <base />
        <rate-limit calls="100" renewal-period="60" counter-key="@(context.Request.IpAddress)" />
    </inbound>
    <backend>
        <base />
    </backend>
    <outbound>
        <base />
    </outbound>
    <on-error>
        <base />
    </on-error>
</policies>

In this example:

calls="100": This sets the maximum number of allowed calls.
renewal-period="60": This defines the time window in seconds for the limit (60 seconds = 1 minute).
counter-key="@(context.Request.IpAddress)": This is the crucial part that determines who is being limited. Here, we’re using the caller’s IP address as the key. This means each unique IP address gets its own 100-call/minute quota.

You can use other context variables for counter-key to define different scopes for rate limiting. For instance, to limit based on the authenticated user’s subscription key, you would use:

<rate-limit calls="100" renewal-period="60" counter-key="@(context.Subscription.Key)" />

This is incredibly useful for distinguishing between different consumers of your API. If you’re using JWT validation, you can even base the limit on a specific claim within the token:

<rate-limit calls="50" renewal-period="300" counter-key="@(context.User.Claims.FirstOrDefault(c => c.Type == "sub")?.Value)" />

Here, sub is a common claim for the subject identifier. This would limit each unique user (identified by their sub claim) to 50 requests every 5 minutes.

The rate-limit policy works by maintaining counters for each unique counter-key. When a request comes in, API Management checks the counter associated with the counter-key for the current request. If the number of calls within the renewal-period exceeds the calls limit, the request is rejected with a 429 Too Many Requests status code. Once the renewal-period elapses, the counter resets, and new requests are allowed again.

You can also implement a shared rate limit across multiple consumers by omitting the counter-key attribute entirely:

<rate-limit calls="1000" renewal-period="60" />

This applies a single limit of 1000 calls per minute to all requests hitting this policy. This is less common for fine-grained control but can be useful for protecting the API Management gateway itself under extreme load.

A more advanced scenario involves using the rate-limit-by-key policy. This policy is designed to be used with a named cache. It’s particularly powerful when you need to share rate limiting state across multiple instances of your API Management service or when you want to define more complex rate limiting logic.

Here’s an example using rate-limit-by-key with a named cache:

First, you’d configure a named cache in your API Management instance under "Caches". Let’s say you name it myRateLimitCache.

Then, in your policy:

<policies>
    <inbound>
        <base />
        <rate-limit-by-key calls="50" renewal-period="300" key="@(context.Request.Headers.GetValueOrDefault("X-API-Key", "")))" cache-name="myRateLimitCache" />
    </inbound>
    <backend>
        <base />
    </backend>
    <outbound>
        <base />
    </outbound>
    <on-error>
        <base />
    </on-error>
</policies>

In this configuration:

key="@(context.Request.Headers.GetValueOrDefault("X-API-Key", "")))": We are using a custom header X-API-Key to identify the consumer. This is a flexible approach that doesn’t rely solely on subscription keys.
cache-name="myRateLimitCache": This explicitly tells the policy to use the named cache myRateLimitCache to store and retrieve the rate limiting counters.

The rate-limit-by-key policy offers greater flexibility and scalability because the state is managed in a distributed cache, which is inherently more resilient and scalable than in-memory counters managed by individual API Management instances. This is particularly relevant in multi-region deployments or when dealing with very high traffic volumes where the state needs to be consistent across different nodes.

One key aspect often overlooked is how the renewal-period interacts with the calls limit. The renewal-period defines a sliding window. If you set calls="10" and renewal-period="60", it doesn’t mean you get 10 requests exactly at the 60-second mark. Instead, it means that over any 60-second interval, you cannot exceed 10 requests. As soon as a request comes in, its timestamp is recorded. The limit is checked against all requests made within the last 60 seconds from the current request’s timestamp.

When you apply multiple rate limiting policies, they are evaluated sequentially. If any one of them triggers a 429, the request is immediately rejected, and subsequent rate limiting policies are not evaluated. This means the order of your policies matters, and it’s generally best to place more granular or stricter limits earlier in the inbound policy flow.

The next challenge you’ll likely encounter is handling the 429 Too Many Requests response gracefully, perhaps by implementing retry logic with exponential backoff in your client applications.