Rate Limiting gRPC Services: Interceptors and Envoy (2026)

gRPC services can be rate-limited at the edge using Envoy, but the real power comes from implementing custom rate-limiting logic within your gRPC application itself via interceptors.

Let’s see rate limiting in action. Imagine a simple gRPC service that increments a counter.

syntax = "proto3";

package ratelimit;

service CounterService {
  rpc Increment (IncrementRequest) returns (IncrementResponse);
}

message IncrementRequest {
  string key = 1;
}

message IncrementResponse {
  int64 count = 1;
}

Here’s a Go implementation of the server with a basic in-memory rate limiter using a token bucket.

package main

import (
	"context"
	"log"
	"net"
	"sync"
	"time"

	"golang.org/x/time/rate"
	"google.golang.org/grpc"
	pb "your_module_path/ratelimit" // Replace with your module path
)

type server struct {
	pb.UnimplementedCounterServiceServer
	mu       sync.Mutex
	counters map[string]int64
	limiters map[string]*rate.Limiter // Token bucket for each key
}

func (s *server) Increment(ctx context.Context, req *pb.IncrementRequest) (*pb.IncrementResponse, error) {
	s.mu.Lock()
	defer s.mu.Unlock()

	key := req.GetKey()

	// Initialize limiter if not present
	if _, ok := s.limiters[key]; !ok {
		// Allow 1 request per second, burst of 2
		s.limiters[key] = rate.NewLimiter(rate.Limit(1), 2)
	}

	limiter := s.limiters[key]
	if err := limiter.Wait(ctx); err != nil {
		// Rate limit exceeded
		return nil, grpc.Errorf(codes.ResourceExhausted, "rate limit exceeded for key: %s", key)
	}

	s.counters[key]++
	log.Printf("Incremented counter for key: %s, new count: %d", key, s.counters[key])
	return &pb.IncrementResponse{Count: s.counters[key]}, nil
}

func main() {
	lis, err := net.Listen("tcp", ":50051")
	if err != nil {
		log.Fatalf("failed to listen: %v", err)
	}
	s := grpc.NewServer()
	pb.RegisterCounterServiceServer(s, &server{
		counters: make(map[string]int64),
		limiters: make(map[string]*rate.Limiter),
	})
	log.Println("Server listening on :50051")
	if err := s.Serve(lis); err != nil {
		log.Fatalf("failed to serve: %v", err)
	}
}

This server has a rate.Limiter for each unique key it receives. When Increment is called, it first checks the limiter. If the request is allowed, the counter is incremented; otherwise, a ResourceExhausted error is returned.

The real sophistication comes from using gRPC interceptors. Interceptors allow you to execute custom logic before or after a gRPC call, without modifying the service implementation itself.

Here’s how you’d add a server-side interceptor to this Go server:

// Add this to your main function before s.Serve(lis)

// RateLimiterInterceptor is a gRPC server interceptor for rate limiting.
func RateLimiterInterceptor(ctx context.Context, req interface{}, info *grpc.UnaryServerInterceptorFunc, handler grpc.UnaryHandler) (interface{}, error) {
	// You'd typically extract identifying information from the request
	// For this example, let's assume the request has a 'Key' field.
	var key string
	switch r := req.(type) {
	case *pb.IncrementRequest:
		key = r.GetKey()
	default:
		// Handle other message types if necessary, or pass through
		return handler(ctx, req)
	}

	// In a real-world scenario, you'd likely fetch the rate limit configuration
	// from a central store or based on the key/user/IP.
	// For this example, we'll use a simple in-memory map of token buckets,
	// similar to the service logic, but demonstrating the interceptor pattern.
	// This map would need to be shared across interceptor calls.

	// Dummy rate limiter for demonstration within the interceptor
	// In practice, this would be managed more robustly.
	limiterMap := make(map[string]*rate.Limiter) // This should be a global or managed by a struct
	rateLimitPerSecond := rate.Limit(1)         // 1 request per second
	burstSize := 2                              // Burst of 2 requests

	if _, ok := limiterMap[key]; !ok {
		limiterMap[key] = rate.NewLimiter(rateLimitPerSecond, burstSize)
	}

	limiter := limiterMap[key]
	if err := limiter.Wait(ctx); err != nil {
		return nil, grpc.Errorf(codes.ResourceExhausted, "rate limit exceeded for key: %s", key)
	}

	// Proceed with the gRPC call
	return handler(ctx, req)
}

// ... in main() ...
s := grpc.NewServer(grpc.UnaryInterceptor(RateLimiterInterceptor))
// ... rest of main ...

The interceptor pattern decouples the rate-limiting logic from your core service business logic. You can apply the same interceptor to multiple services or RPC methods. For more complex scenarios, you might have a dedicated rate-limiting service that your interceptor calls.

Envoy acts as an API gateway or sidecar proxy. It can perform rate limiting at the network edge before requests even reach your gRPC application. Envoy’s rate limiting is typically configured via its API, often with a separate rate-limiting service (e.g., ratelimit.envoyproxy.io).

Here’s a snippet of an Envoy configuration for gRPC rate limiting:

# Envoy configuration snippet
http_filters:
- name: envoy.filters.http.local_ratelimit
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
    stat_prefix: local_rate_limit
    token_bucket:
      max_tokens: 100 # Max burst size
      fill_interval: 1s # Fill rate per second
      # For per-route or per-header rate limiting, more complex configurations exist.
    # This is a basic global rate limit. For more granular control,
    # you'd use rate limiting filters that integrate with an external RL service.

When using Envoy’s external rate-limiting service, Envoy sends a request to the RL service, which determines if the request should be allowed. The RL service might use shared rate-limiting counters (e.g., Redis) to enforce limits across multiple Envoy instances. This is crucial for distributed systems.

The most surprising thing about gRPC rate limiting is that you often don’t need to implement it directly in your service code if you’re using an API gateway like Envoy. Envoy can handle it at the edge, saving your application resources.

The Envoy local_ratelimit filter is straightforward for simple, global limits. However, for dynamic, per-client, or per-request-attribute rate limiting, you’ll integrate Envoy with the official Envoy Rate Limit Service. This service typically uses a distributed cache like Redis to maintain shared rate limit counters across all Envoy proxies. Envoy proxies query this service, which returns a decision (allow/deny), and Envoy enforces it.

When implementing rate limiting within your gRPC application using interceptors, the choice of rate limiting algorithm is important. While token buckets (like golang.org/x/time/rate) are common, consider algorithms like leaky bucket for smoother output or fixed-window counters for simplicity, depending on your exact requirements and how you want to handle bursts. The rate.Wait(ctx) call in the Go example is blocking. For highly concurrent servers, you might prefer non-blocking approaches or managing a pool of token buckets to avoid blocking goroutines unnecessarily.

The next step after implementing robust rate limiting is to monitor its effectiveness and adjust the limits based on observed traffic patterns and application performance.