The most surprising thing about latency, throughput, and error rate is how often they’re treated as independent variables when they’re deeply interconnected.
Let’s see this in action. Imagine a simple web service handling user requests.
requests_per_second = 100
average_latency_ms = 50
error_rate_percent = 0.1
This looks healthy. But what happens when load increases?
requests_per_second = 200
average_latency_ms = 75 # Latency creeps up
error_rate_percent = 0.5 # Error rate starts to climb
The system is starting to struggle. More requests mean more work for the service, which takes longer to process (higher latency). As it gets slower, some requests might time out or fail due to resource exhaustion, leading to more errors.
Conversely, if we try to force more throughput by overwhelming the system:
requests_per_second = 500
average_latency_ms = 500 # Latency explodes
error_rate_percent = 15.0 # Error rate skyrockets
Here, the system is completely saturated. Latency becomes unacceptably high, and a significant portion of requests are failing. Throughput might appear high for a brief moment, but it’s unsustainable and misleading.
The Mental Model
Think of these metrics as the heartbeat of your system.
- Latency: This is the time it takes for a single request to complete. It’s the duration from when a request enters the system to when a response is sent back. Low latency means fast responses. High latency means slow responses. It’s often measured as an average, but percentiles (like p95 or p99) are crucial because averages can hide outliers that severely impact user experience.
- Throughput: This is the volume of work the system can handle over a period. For a web service, it’s typically measured in requests per second (RPS) or transactions per second (TPS). High throughput means the system can process many requests concurrently or in rapid succession.
- Error Rate: This is the percentage of requests that fail. A 0% error rate is ideal. Any non-zero error rate indicates something is broken, whether it’s a bug, a resource issue, or a downstream dependency failure.
These metrics form a feedback loop. If you increase throughput without scaling capacity, latency will increase. If latency increases beyond a certain threshold, error rates will also increase (due to timeouts, resource exhaustion, etc.). Conversely, reducing latency (by optimizing code, adding resources) can often increase achievable throughput. High error rates are a symptom of underlying problems that also likely impact latency and throughput.
Levers You Control
- Resource Allocation: More CPU, memory, or network bandwidth can often increase throughput and reduce latency, up to a point.
- Code Optimization: Efficient algorithms and data structures directly reduce the work needed per request, lowering latency and allowing for higher throughput.
- Concurrency Settings: How many requests a system can handle simultaneously (e.g., thread pool sizes, connection limits) directly impacts throughput and can affect latency under load.
- Caching: Storing frequently accessed data closer to the application reduces latency and offloads work from backend systems, indirectly boosting throughput.
- Load Balancing: Distributing requests across multiple instances prevents any single instance from becoming a bottleneck, improving overall throughput and resilience.
- Database Tuning: Slow database queries are a common cause of high latency. Optimizing queries, indexing, and connection pooling is critical.
- Network Configuration: Network hops, bandwidth limitations, and firewalls can all add latency.
When analyzing these, always consider the context. A latency of 100ms might be acceptable for a batch job but terrible for a real-time trading application. Throughput of 10 RPS might be fine for a niche internal tool but disastrous for a popular e-commerce site. And any error rate above 0% warrants immediate investigation.
The most common pitfall is optimizing one metric in isolation. For example, aggressively tuning a system to maximize throughput might lead to unacceptable latency spikes, causing user complaints and increased error rates. The goal is to find the sweet spot where all three metrics are within acceptable bounds for your specific application’s needs.
The next concept you’ll grapple with is how to implement distributed tracing to understand the sources of latency across multiple services.