The InFlightReq middleware in Traefik is your gatekeeper, preventing your backend services from being swamped by too many concurrent requests.

Let’s see it in action. Imagine you have a single instance of a web application running behind Traefik. By default, it can handle a decent number of requests, but if you suddenly get a surge, it might choke.

http:
  routers:
    my-app-router:
      rule: "Host(`myapp.example.com`)"
      service: "my-app-service"
      middlewares:
        - "rate-limit-inflight"

  services:
    my-app-service:
      loadBalancer:
        servers:
          - url: "http://192.168.1.100:8080"

  middlewares:
    rate-limit-inflight:
      inFlight:
        max: 5

In this setup, Traefik will only allow a maximum of 5 concurrent requests to reach the my-app-service at any given time. If a sixth request arrives while the first five are still being processed, Traefik will reject it immediately with a 503 Service Unavailable error, rather than forwarding it to the backend. This is crucial because it protects your backend from overload, preventing cascading failures and ensuring a more stable user experience, even under heavy load.

The core problem InFlightReq solves is resource exhaustion on your backend services. Without it, a sudden spike in traffic can overwhelm your application’s capacity to process requests, leading to increased latency, dropped connections, and eventual service outages. InFlightReq acts as a circuit breaker at the edge, absorbing the initial shock and gracefully degrading service by rejecting excess requests rather than letting the entire system crumble.

Internally, InFlightReq maintains a counter for each service it’s applied to. When a request arrives for a service, Traefik checks this counter. If the counter is below the configured max value, the request is forwarded, and the counter is incremented. When a response is sent back from the service, the counter is decremented. If the counter is already at max, Traefik immediately returns a 503 to the client without ever bothering the backend.

The max option is your primary lever. It directly controls the maximum number of simultaneous requests allowed. Setting this too low can lead to legitimate users being denied service, while setting it too high defeats the purpose of the middleware. You’ll want to tune this based on your backend’s observed capacity and typical traffic patterns.

If you want to be more granular, you can apply InFlightReq to specific routes or even specific services. This allows you to protect critical parts of your application more aggressively than less sensitive ones. For instance, a resource-intensive API endpoint might have a lower max than a static content delivery route.

The InFlightReq middleware doesn’t inspect the content of requests or their originating IP addresses; it’s purely a count of active connections to a backend service. This makes it very efficient but also means it won’t protect against denial-of-service attacks that involve a large number of distributed clients making single, non-resource-intensive requests.

When a request is rejected by InFlightReq, the HTTP response code is 503 Service Unavailable. This is a standard way for a server to indicate that it’s temporarily unable to handle the request due to being overloaded. The response body will typically contain a default message from Traefik, but you can customize this with the errorMessage and status fields within the inFlight configuration.

The next logical step after mastering rate limiting is to consider more sophisticated traffic shaping, such as implementing request queuing with the Queueing middleware or using IP-based rate limiting to prevent individual clients from hogging resources.

Want structured learning?

Take the full Rate-limiting course →