Rate limiting is crucial for microservices to prevent overload and ensure fair usage, but the common approach of implementing it at the API Gateway is fundamentally flawed.
Here’s a typical scenario:
# Example Nginx configuration for rate limiting
http {
limit_req_zone $binary_remote_addr zone=mylimit:10m rate=5r/s;
server {
location / {
limit_req zone=mylimit burst=20 nodelay;
proxy_pass http://backend_service;
}
}
}
In this setup, a single API Gateway instance tries to enforce rate limits for all incoming requests. When traffic spikes, the gateway can become a bottleneck itself, leading to dropped requests before they even reach the individual microservices. This means you’re rate-limiting the gateway, not the backend services, and the gateway itself can become the single point of failure.
The core problem is that the API Gateway, acting as a centralized point, doesn’t have the granular context of which microservice is being hit and its specific capacity. It treats all requests equally, regardless of their destination. This leads to situations where a burst of traffic targeting a low-capacity service can overwhelm the gateway, impacting requests for high-capacity services as well.
The Sidecar Pattern: A Better Approach
A more robust solution is to implement rate limiting closer to the services themselves, using a sidecar proxy pattern. In this model, each microservice has its own dedicated proxy (the sidecar) that handles concerns like rate limiting, logging, and service discovery.
Consider a service user-service with its sidecar proxy, often implemented using Envoy or Nginx. The sidecar sits alongside the user-service container within the same pod or deployment.
Here’s how the traffic flow changes:
- Client Request: A client sends a request to the
user-service. - Sidecar Intercepts: The request first hits the
user-service’s sidecar proxy. - Rate Limiting: The sidecar checks its configured rate limits for
user-service.- If the limit is exceeded, the sidecar rejects the request immediately (e.g., with a
429 Too Many RequestsHTTP status). - If the limit is not exceeded, the sidecar forwards the request to the actual
user-servicecontainer.
- If the limit is exceeded, the sidecar rejects the request immediately (e.g., with a
- Service Processing: The
user-serviceprocesses the request. - Response: The
user-servicesends the response back to the sidecar, which then returns it to the client.
This distributes the rate-limiting logic, making it scalable and resilient. Each service manages its own rate limits, preventing a single point of failure.
Configuration Example (Envoy Proxy)
Let’s look at a simplified Envoy configuration for a sidecar proxy managing rate limits for a product-service.
# envoy.yaml (simplified for demonstration)
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 10000 # Sidecar listens on this port
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
prefix: "/"
route:
cluster: product_service_cluster
http_filters:
- name: envoy.filters.http.local_ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
stat_prefix: rl
rate_limit_policy:
rate_limits:
- limit:
unit: SECOND
requests_per_unit: 10 # Allow 10 requests per second
- name: envoy.filters.http.router
typed_config: {}
clusters:
- name: product_service_cluster
connect_timeout: 0.25s
type: LOGICAL_DNS
# The actual product-service runs on a different port within the same pod
# or accessible via localhost. Here we assume localhost.
dns_lookup_family: V4_ONLY
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: product_service_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1 # Target the actual service
port_value: 8080 # Port the product-service listens on
In this Envoy configuration:
- The
listener_0on0.0.0.0:10000is where incoming requests to theproduct-servicesidecar arrive. - The
envoy.filters.http.local_ratelimitfilter is configured to allow10requests perSECOND. - When a request passes the rate limit, the
envoy.filters.http.routerforwards it to theproduct_service_cluster, which is configured to point to127.0.0.1:8080– the actualproduct-servicerunning locally.
This setup means the rate limiting is happening right next to the product-service, isolated from the traffic of other services. If product-service gets overwhelmed, it only affects requests destined for product-service.
The Crucial Insight: Distributed Enforcement
The fundamental shift here is from centralized, shared enforcement at the gateway to distributed, per-service enforcement via sidecars. The API Gateway might still be useful for cross-cutting concerns like authentication, SSL termination, or coarse-grained API routing, but for granular rate limiting, pushing it to the edge of each service is the only scalable and robust pattern. This ensures that rate limits are applied based on the service’s capacity and load, not the gateway’s.
The true power of the sidecar pattern for rate limiting lies in its ability to scale independently with your services. As you add more instances of a microservice, you also add more instances of its rate-limiting sidecar, distributing the load and maintaining consistent performance. This avoids the cascading failures that plague centralized gateway-based rate limiting.
A common pitfall is forgetting to configure the sidecar’s rate limits to be aware of the actual service port and address. If the sidecar forwards to the wrong localhost port or the service itself isn’t listening on the expected port, requests will still fail, but the rate limiter will appear to be working correctly.
The next challenge you’ll likely encounter is how to dynamically update these rate limits without redeploying your sidecar proxies.