The fastest way to kill your system’s throughput isn’t a bug, it’s a feature that’s being used wrong.
Let’s say you’re running a web service, and you’re seeing requests pile up, latency skyrocket, and eventually, things just start returning 503s. You’ve got a throughput problem. You probably think it’s about scaling up servers or optimizing some complex algorithm. More often than not, it’s something much simpler, and far more insidious.
Consider this scenario:
GET /api/v1/users/12345 HTTP/1.1
Host: example.com
Accept: application/json
This is a simple GET request. Seems harmless, right? But what if your backend service, designed for quick lookups, is suddenly being hit with thousands of these per second, and each GET /api/v1/users/12345 triggers a cascade of operations: a database query, a cache check, a call to an external identity provider, and then a final serialization. If any one of those steps is slow, or if you’re doing them serially when they could be parallel, you’re building a bottleneck.
The real killer here is often over-fetching or under-fetching data.
Imagine a frontend that needs a list of user IDs and their names to display a dropdown.
Under-fetching: The frontend makes a request like GET /api/v1/users. The backend, designed for efficiency, returns only the user IDs. Now the frontend has to make another request for each user to get their name: GET /api/v1/users/{id}/name. This is a classic N+1 problem in disguise. For 100 users, that’s 101 requests instead of one.
Over-fetching: The frontend needs only the user’s name and email. But the GET /api/v1/users/{id} endpoint, designed for administrative views, returns everything: ID, name, email, address, creation date, last login, activity logs, purchase history, etc. The frontend receives gigabytes of data it will never use, consuming bandwidth and CPU cycles on both client and server for processing.
The system works, but it’s like trying to drink from a firehose when you only need a sip.
The solution often lies in GraphQL or API Gateway aggregation. GraphQL lets the client specify exactly what data it needs in a single request.
query {
users(first: 100) {
id
name
}
}
This single query asks for the first 100 users and only their id and name. The backend can then efficiently fetch and return just that data.
Another common culprit is unbounded concurrency. You might have a service that can handle 100 concurrent requests beautifully. But if you expose it without any rate limiting or connection pooling, and a sudden traffic spike hits, it might try to handle 10,000 concurrent requests. The underlying resources (CPU, memory, network sockets, database connections) get exhausted, leading to timeouts and errors, not graceful degradation.
Consider a simple worker process that polls a queue for tasks. If the polling interval is too aggressive (e.g., every 10ms), or if you scale up to 1000 such workers without throttling, you’ll hammer the queue’s API.
# Example: Checking connection pool size in PostgreSQL
SHOW max_connections;
SHOW shared_buffers;
If max_connections is set to 100, and you have 200 application instances each trying to maintain 10 connections, you’re already over the limit before any requests even come in. The fix might be as simple as increasing max_connections in postgresql.conf to 300 and restarting the PostgreSQL service, or, more commonly, configuring connection pooling on the application side. Libraries like HikariCP for Java or pgxpool for Go allow you to set a maximumPoolSize (e.g., 50) which ensures you never exceed a reasonable number of connections to the database, even if your application scales. This works because it acts as a gatekeeper, serializing requests to the database beyond the pool size, preventing exhaustion.
Then there’s synchronous I/O in critical paths. If your web server, during a request, makes a call to an external service and waits for the response, blocking the entire thread, throughput plummets. A single slow external API call can hold up dozens of threads, making your service unresponsive.
Imagine a user registration flow where the server must:
- Validate email (external service call)
- Send welcome email (external service call)
- Create user in database
- Log registration event (external service call)
If any of these external calls take 5 seconds, the user has to wait 20 seconds. If your server has only 100 threads, and 20 users are registering, you’ve already saturated your thread pool with just these synchronous waits.
The fix is asynchronous I/O and message queues. Instead of waiting for the welcome email, the registration service publishes an event like UserRegistered to a message queue (e.g., Kafka, RabbitMQ). A separate worker service subscribes to this event and handles sending the email. This frees up the registration service’s threads immediately.
# Synchronous (BAD)
response_email = requests.post("http://email.service/send", json={"to": email, "subject": "Welcome!"})
# ... wait for response_email ...
# Asynchronous (GOOD)
message_queue.publish("user_registered", {"user_id": user_id, "email": email})
The message queue acts as a buffer, decoupling the services and allowing them to operate at their own pace.
Finally, consider excessive logging. While logging is crucial for debugging, logging every single detail of every request at DEBUG or INFO level in a high-traffic environment can saturate your disk I/O or network bandwidth for log shipping. If your application spends more time writing logs than processing requests, throughput dies.
The fix is dynamic log level adjustment and sampling. Configure your logging framework (e.g., Logback, Winston) to default to WARN or ERROR in production. Use configuration management tools (like Spring Cloud Config or Consul) to dynamically change log levels to DEBUG for specific components or instances only when troubleshooting. For high-volume events, implement sampling: log only 1 in 1000 requests, for example.
# Logback configuration example
<configuration>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="STDOUT" />
</root>
<logger name="com.example.MyService" level="DEBUG"/> <!-- Dynamically changeable -->
</configuration>
This allows you to dial logging up and down without redeploying.
The next performance bottleneck you’ll likely encounter after fixing these is related to cache invalidation strategies.