Rate Limiting in Kubernetes: Envoy and Istio Policies (2026)

Envoy’s rate limiting isn’t about dropping requests; it’s about telling upstream services how to drop requests.

Let’s watch it in action. Imagine a simple HTTP service running in Kubernetes, exposed via Istio. Istio, by default, uses Envoy as its sidecar proxy.

# Service definition (simplified)
apiVersion: v1
kind: Service
metadata:
  name: my-app
spec:
  selector:
    app: my-app
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
---
# Istio VirtualService to route traffic to the service
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: my-app-vs
spec:
  hosts:
  - "my-app.example.com"
  http:
  - route:
    - destination:
        host: my-app
        port:
          number: 80

Now, we want to limit requests to my-app.example.com to 10 requests per second. We define an Istio RateLimit resource:

# Istio RateLimit policy
apiVersion: networking.istio.io/v1alpha3
kind: RateLimit
metadata:
  name: my-app-ratelimit
spec:
  targets:
  - select:
      host: my-app.example.com
  rate:
    # Limit to 10 requests per second
    requestsPerUnit: 10
    unit: SECOND
  # Define a namespace for the rate limit, typically 'istio-system'
  # where the Istio control plane and its rate limiting service reside.
  # This is crucial for the policy to be picked up by the Istio components.
  namespace: istio-system

When a request hits the Envoy sidecar for my-app, Envoy consults its configuration. If a rate limit policy is active for that host, Envoy doesn’t directly enforce the limit itself. Instead, it makes a gRPC call to a separate Rate Limiting service. This service (often deployed as istio-citadel or a dedicated ratelimit Deployment within the istio-system namespace) maintains the actual counters and decides whether the request should be allowed or denied.

If the Rate Limiting service determines the limit has been exceeded, it responds to Envoy with a denial. Envoy then injects an HTTP 429 Too Many Requests response header and drops the request, preventing it from reaching my-app.

The mental model here is a two-stage process:

Envoy (the sidecar proxy): This is the gateway. It intercepts all incoming traffic for the pod it’s attached to. When a request arrives, Envoy checks if there’s an applicable rate limit policy defined in its configuration. If there is, Envoy doesn’t act alone.
Rate Limiting Service: This is the "brains" of the operation. Envoy, upon detecting a rate limit policy, makes a gRPC call to this external service. This service is responsible for maintaining shared counters across multiple Envoy instances (if you have more than one pod for your service) and making the actual decision: "Is this request allowed or denied based on the defined limits?"

The RateLimit resource in Istio is how you declaratively define these policies. You specify which services (via targets.select.host) are subject to rate limiting, and what the limits are (rate.requestsPerUnit, rate.unit). The namespace field is critical because it tells Istio where to find the Rate Limiting service to communicate with.

The configuration of the Rate Limiting service itself is usually done through a Redis backend for storing counters. You’d typically see a configuration file like conf/config.yaml in the rate limiting service deployment:

# Example rate limiting service config (simplified)
domain: istio-system
redis_default_server: "redis-service.istio-system.svc.cluster.local:6379"
runtime:
  symlink_policy: "allow"
  subdirectory: "/etc/ratelimit/config/"

This tells the rate limiting service to use redis-service in the istio-system namespace as its counter storage. The domain is important as it acts as a key prefix in Redis, ensuring that rate limits for different Istio configurations don’t clash.

The power comes from the fact that Envoy is stateless regarding rate limit counters. It delegates this stateful responsibility to the dedicated Rate Limiting service. This allows for consistent rate limiting across all instances of a service, even if your Kubernetes deployment scales up or down.

When you define a rate limit, you’re not just setting a number; you’re configuring a distributed system. Envoy acts as the agent, and the Rate Limiting service acts as the central authority, using Redis as its shared memory.

The most surprising thing is that Envoy doesn’t store the rate limit counters itself. It’s purely a request forwarder that consults an external, stateful service for the actual decision.

When you configure these policies, Istio injects the necessary configuration into each Envoy sidecar. This configuration includes the address of the Rate Limiting service and the details of the policies to enforce. Envoy then periodically polls or is updated by the Istio control plane with these policies.

The next hurdle you’ll likely face is understanding how to implement more sophisticated rate limiting strategies, like per-user or per-API-key limiting, which involves customizing Envoy’s filter configuration and potentially adding custom headers to your requests.