Load Testing Secrets for Production Scale

Load testing doesn’t just confirm your system can handle load; it reveals the hidden performance bottlenecks that only appear under stress.

Let’s watch a simple API endpoint, /users/{id}, under increasing load using k6.

First, the script:

import http from 'k6/http';
import { sleep } from 'k6';

export let options = {
  stages: [
    { duration: '1m', target: 50 }, // ramp up to 50 users over 1 minute
    { duration: '2m', target: 50 }, // stay at 50 users for 2 minutes
    { duration: '1m', target: 100 }, // ramp up to 100 users over 1 minute
    { duration: '2m', target: 100 }, // stay at 100 users for 2 minutes
    { duration: '1m', target: 0 },  // ramp down to 0 users over 1 minute
  ],
  thresholds: {
    http_req_failed: 'rate<0.01', // http errors should be less than 1%
    http_req_duration: 'p(95)<500', // 95% of requests should be below 500ms
  },
};

export default function () {
  const userId = Math.floor(Math.random() * 1000) + 1;
  http.get(`http://localhost:8080/users/${userId}`);
  sleep(1); // wait 1 second between requests
}

Now, let’s run it against a local go application that fetches user data from a PostgreSQL database.

k6 run --out json=results.json loadtest.js

The output will show metrics like http_req_duration, http_req_failed, vus (virtual users), and iterations. We’re looking for trends:

http_req_duration (p95): Does this stay flat, or does it climb as vus increase? A climb means your system is struggling to keep up.
http_req_failed: Does this spike? Even a small percentage of failures under load is critical.
vus: The number of concurrent users actively making requests.
iterations: The total number of requests made.

Observing the results, we might see the p95 duration creep up from 150ms at 50 users to 700ms at 100 users, and maybe a few http_req_failed errors appear. This tells us something is breaking down under pressure.

The goal of load testing validation isn’t just to hit a target number of requests per second. It’s to understand why performance degrades. Your application might be perfectly fine with 10 users, but at 100, a subtle inefficiency becomes a catastrophic bottleneck.

Internally, k6 simulates concurrent users (vus). Each vus executes the default function in your script. The stages define how the number of vus changes over time. The thresholds are assertions: if these aren’t met, the test fails, flagging a performance issue. The real power comes from correlating these metrics with your application’s internal metrics (CPU, memory, database connections, queue lengths) to pinpoint the exact cause of degradation.

A common misconception is that if your latency is low and error rate is zero at low load, you’re good. But the critical insight is that many performance problems are non-linear. A database query that takes 5ms for one user might take 500ms for 100 users if the database is maxed out on connections or CPU. The load test uncovers these emergent behaviors.

The next step after identifying and fixing a performance bottleneck is to understand its upstream or downstream dependencies.