Storage IOPS, Latency, Throughput: The Real Deal

Storage latency and IOPS are fundamentally about how quickly your system can talk to its storage and how much it can say in a given time.

Let’s see this in action. Imagine a web server. A user clicks a link, and the server needs to fetch an image file. This involves a read operation from disk.

# Simulate a read request
echo "READ /images/logo.png" > /var/log/webserver.log

The time it takes for the disk to find and return logo.png is the latency. If many users click at once, and the disk can only serve a limited number of these requests per second, that’s the IOPS limit. Throughput is the total amount of data transferred per unit of time, often measured in MB/s or GB/s, which is a function of both latency and IOPS.

The core problem these metrics solve is understanding and alleviating bottlenecks. If your application is slow, it’s often because it’s waiting for storage. High latency means each individual request takes too long. Low IOPS means you can’t handle many requests simultaneously. Poor throughput means you can’t move data fast enough.

Internally, these operations involve a complex dance between the operating system, the storage driver, the controller, and the physical media (HDD, SSD, NVMe). When you request a file, the OS tells the storage driver. The driver translates this into commands for the controller, which then directs the physical storage device. For writes, the process is reversed, and often more complex due to journaling, caching, and wear leveling.

You control this through several levers:

Queue Depth: This is the number of pending I/O operations the storage controller can handle. Higher queue depths can improve performance by allowing the controller to reorder requests for efficiency, especially with modern SSDs and NVMe drives. You might see this parameter in sysfs for block devices, like /sys/block/sda/queue/nr_requests. Increasing it from 128 to 256 can sometimes help.
Block Size: The size of data chunks read or written at a time. Larger block sizes can increase throughput for sequential reads/writes but can increase latency for small, random I/O. This is often configured at the filesystem level (e.g., mkfs.ext4 -b 4096) or during application design.
RAID Configuration: For hardware RAID, the choice of RAID level (RAID 0, 1, 5, 6, 10) significantly impacts performance. RAID 0 offers the highest throughput but no redundancy. RAID 10 offers a good balance of performance and redundancy by striping across mirrored pairs.
Filesystem Tuning: Filesystems have their own parameters that affect I/O. For example, ext4 has options like data=ordered vs. data=writeback that trade durability for performance. XFS has reflink and allocsize which can impact metadata operations and file allocation.
Storage Hardware: The underlying hardware is paramount. An NVMe SSD will drastically outperform a spinning HDD for both latency and IOPS. The number of IOPS a drive can sustain is often listed by the manufacturer (e.g., "100,000 IOPS").

The one thing most people don’t fully grasp is how much the application’s access pattern dictates which storage metric matters most. A database performing many small, random reads and writes will be bottlenecked by IOPS and latency. A video editing workstation doing large, sequential file transfers will be limited by raw throughput (MB/s). Optimizing for one might not help, or could even hurt, the other. For instance, increasing block size to boost sequential throughput might cripple random IOPS performance.

The next frontier you’ll likely encounter is understanding how network storage (like NFS or iSCSI) introduces its own latency and throughput considerations beyond local storage.