Scalability Testing: Beyond Load

The most surprising thing about scalability testing is that the "limit" you find is rarely the system’s true limit; it’s usually a bottleneck in your testing tools or methodology.

Let’s see this in action. Imagine a simple web service that counts user visits.

package main

import (
	"fmt"
	"net/http"
	"sync"
)

var (
	visitCount int
	mu         sync.Mutex
)

func visitHandler(w http.ResponseWriter, r *http.Request) {
	mu.Lock()
	visitCount++
	mu.Unlock()
	fmt.Fprintf(w, "Visit count: %d\n", visitCount)
}

func main() {
	http.HandleFunc("/", visitHandler)
	fmt.Println("Starting server on :8080")
	http.ListenAndServe(":8080", nil)
}

This service is incredibly simple. Each request increments a global counter, protected by a mutex.

Now, how do we test its "scale"? We need to simulate many users hitting it concurrently. A common tool for this is k6. Here’s a basic k6 script:

import http from 'k6';
import { sleep } from 'k6';

export const options = {
  vus: 1000, // virtual users
  duration: '30s', // duration of the test
};

export default function () {
  http.get('http://localhost:8080');
  sleep(1); // pause between requests
}

When we run this, k6 spins up 1000 virtual users, each making a request every second for 30 seconds, hitting our Go server. We’re looking for the point where the server starts dropping requests, latency spikes, or error rates climb.

The system we’re testing – the Go web service – has a few key components that affect its scale:

CPU: The visitHandler is computationally light, but if it were doing more complex work (database queries, heavy processing), CPU would be a primary bottleneck.
Memory: For this simple service, memory isn’t an issue. But if the application held large datasets in memory or had memory leaks, this would be a limit.
Network I/O: The server needs to accept incoming connections and send responses. High request rates can saturate the network interface.
Concurrency Primitives: The sync.Mutex in our Go code is crucial. If many goroutines contend for the same lock, it becomes a serialization point, severely limiting throughput.
Underlying OS Limits: File descriptors, process limits, and TCP connection limits can all cap performance.
The Testing Tool Itself: This is the surprising part. k6 has its own limits on how many VUs it can spin up and how fast it can send requests, depending on the machine running k6.

To understand the mental model, think of your system as a series of pipes. Some pipes are wide (fast CPU), some are narrow (slow disk I/O), and some have valves (locks, thread pools). Scalability testing is about finding the narrowest pipe or the most restrictive valve when you try to push a lot of fluid through.

The goal is to identify the bottleneck. Is it the application code (e.g., inefficient algorithms, excessive locking)? Is it the infrastructure (e.g., under-provisioned CPU, slow network)? Or is it your testing setup?

Let’s say you run the k6 test and see 5xx errors on the server. Your first instinct might be to say, "The Go app can’t handle 1000 VUs." But before you jump to that conclusion, consider the machine running k6. If it’s a weak laptop, k6 itself might be the bottleneck. It might not be able to generate 1000 concurrent requests fast enough, leading to dropped connections or timeouts before the Go server even sees them properly.

To test the Go server’s limits, you’d ideally run k6 on a machine with significantly more resources than the server you’re testing. You’d gradually increase the vus and duration in k6 and monitor both k6’s output (error rates, latencies) and the server’s metrics (CPU, memory, network, request rates, goroutine count).

The one thing most people don’t realize is how much the network between your load generator and your target system can masquerade as an application bottleneck. If your load generator is in AWS us-east-1 and your target is in us-west-2, the latency and bandwidth limitations of inter-region networking can cap your throughput long before your application hits its CPU or memory limits. You might see k6 reporting high latencies and http.get errors, but the Go server’s CPU might be at 10%. In this case, the bottleneck isn’t the Go code; it’s the network path.

Once you’ve exhausted the limits of your testing tool and the network, you’ll start to see the true application bottlenecks. For our simple Go app, if we ran k6 from a powerful machine on the same network segment, we’d eventually hit a point where the sync.Mutex becomes the bottleneck. At that point, the visitCount would increment much slower than the requests arriving, leading to increasing latencies and potential timeouts for the k6 clients. The next step would be to explore distributed counters or alternative concurrency models.