Stress testing doesn’t just find out what breaks; it reveals the hidden assumptions your system makes about resource availability.

Let’s say we have a simple web service that looks up user profiles by ID.

from flask import Flask, request, jsonify
import time
import random

app = Flask(__name__)

# Simulate a database with user profiles
user_db = {
    str(i): {"name": f"User {i}", "email": f"user{i}@example.com"}
    for i in range(1000)
}

@app.route('/user/<user_id>', methods=['GET'])
def get_user(user_id):
    # Simulate database lookup latency
    lookup_time = random.uniform(0.01, 0.1)
    time.sleep(lookup_time)

    if user_id in user_db:
        return jsonify(user_db[user_id])
    else:
        return jsonify({"error": "User not found"}), 404

if __name__ == '__main__':
    app.run(debug=False, port=5000)

We can use a tool like wrk to bombard this service with requests and see how it behaves.

wrk -t4 -c100 -d30s http://localhost:5000/user/123

Here, -t4 means 4 threads, -c100 means 100 concurrent connections, and -d30s means run for 30 seconds.

The output from wrk will show us metrics like requests per second, latency distributions, and importantly, error rates. If our service starts dropping requests or returning 5xx errors under load, that’s our first indication of a breaking point.

The problem this solves is proactively identifying bottlenecks before they impact real users. It’s about understanding the capacity limits of your application and its underlying infrastructure. Internally, stress testing works by simulating a high volume of traffic, pushing components like the CPU, memory, network, and disk I/O to their limits. The goal is to observe how the system degrades under duress.

You control the levers by adjusting the load parameters: the number of threads, the number of concurrent connections, the duration of the test, and the specific endpoints you target. You might also introduce variations like varying request payloads or simulating slow downstream dependencies.

The surprising thing is that often, the breaking point isn’t a hard crash, but a gradual, almost imperceptible degradation of performance. A service that normally responds in 50ms might slowly creep up to 500ms, then 2 seconds, and eventually start returning timeouts or internal server errors. This slow bleed is harder to detect than an outright failure and can silently erode user experience long before anyone notices a "problem." It’s the system’s way of whispering its discomfort before it screams.

The next concept to explore is soak testing, which is about sustained load over longer periods to find memory leaks or resource exhaustion issues.

Want structured learning?

Take the full Performance course →