ASP.NET Core’s built-in rate limiting middleware doesn’t actually prevent requests from arriving; it simply decides whether to let them proceed or drop them after they’ve been accepted by the server.

Here’s a quick demo showing it in action. Imagine we have a simple API endpoint that we want to limit to 5 requests per minute per IP address.

using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Hosting;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.AspNetCore.Http; // For context.Response.StatusCode

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddRateLimiting(options =>
{
    options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(context =>
        RateLimitPartition.GetFixedWindowLimiter(
            partitionKey: context.Connection.RemoteIpAddress?.ToString() ?? "unknown",
            factory: _ => new FixedWindowRateLimiterOptions
            {
                PermitLimit = 5,
                Window = TimeSpan.FromMinutes(1),
                QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
                QueueLimit = 0 // No queuing, just drop
            }));
});

var app = builder.Build();

app.UseRateLimiting(); // This MUST come before UseRouting or UseEndpoints

app.MapGet("/", () => "Hello World!");

app.Run();

If you hit this endpoint 6 times within a minute from the same IP, the first 5 requests will get a 200 OK with a Content-Type: text/plain and the body "Hello World!". The 6th request will immediately get a 429 Too Many Requests response with an empty body. The server accepted the request, processed it up to the point of the rate limiter, and then decided to reject it.

The core problem rate limiting solves is preventing a single client (or a group of clients) from overwhelming your application’s resources, whether that’s CPU, memory, network bandwidth, or even downstream dependencies. Without it, a malicious or buggy client could bring your service to its knees. The ASP.NET Core middleware provides a declarative way to enforce these limits directly within the HTTP pipeline.

Internally, the AddRateLimiting extension method registers the necessary services, including the IRateLimiter and related components. The UseRateLimiting middleware then intercepts incoming requests. It uses a PartitionedRateLimiter to apply different limiting strategies to different "partitions" of requests. In our example, we partition by the client’s IP address (context.Connection.RemoteIpAddress?.ToString()). For each partition, we configure a FixedWindowRateLimiter. This means for a given IP address, it tracks requests within a fixed one-minute window. If the PermitLimit (5 in our case) is reached within that Window, subsequent requests are rejected. QueueLimit = 0 ensures that requests exceeding the limit are immediately dropped rather than being queued for later processing.

The RateLimitPartition.GetFixedWindowLimiter is a factory that creates a IFixedWindowRateLimiter instance for each unique partitionKey. The partitionKey is crucial; it determines how requests are grouped. Common keys include IP addresses, user IDs (if authenticated), API keys, or even specific endpoints. The factory lambda receives the partitionKey and returns the FixedWindowRateLimiterOptions. This allows for dynamic configuration based on the request context.

The PermitLimit is the maximum number of requests allowed within the Window. The Window defines the duration over which the PermitLimit is enforced. QueueProcessingOrder dictates how queued items are processed if QueueLimit is greater than zero; OldestFirst means the earliest requests are processed first. Setting QueueLimit to 0 is a common strategy for simple rejection, as it avoids holding onto requests that are likely to be timed out by the client anyway.

The exact levers you control are primarily within the RateLimitPartition configuration. You can choose different rate-limiting algorithms like SlidingWindowRateLimiter (smoother limits but more memory intensive) or TokenBucketRateLimiter (allows for bursts of traffic). You can also define multiple limiters, perhaps a global limit and then more specific limits for authenticated users or premium tiers, by chaining PartitionedRateLimiter instances. The partitionKey is your primary tool for segmenting traffic.

A common point of confusion is that the 429 Too Many Requests response doesn’t automatically include details about why the request was limited or when the limit will reset. You’ll often need to add custom logic or use libraries that provide more informative headers like Retry-After or custom X-RateLimit-* headers to give clients better feedback.

The next thing you’ll likely encounter is how to manage rate limits across multiple instances of your application in a distributed environment, as the built-in middleware’s state is typically held in-memory per instance.

Want structured learning?

Take the full Rate-limiting course →