perf top is not just a sampling profiler; it’s a real-time, interactive performance analysis tool that can pinpoint CPU hotspots with surgical precision, even on live, production systems.

Let’s see it in action. Imagine you have a web server that’s become sluggish. You SSH into the server and run:

sudo perf top

Immediately, you’re presented with a dynamic list of functions consuming the most CPU time, sorted by percentage. You might see something like this:

Overhead  Command    Shared Object      Symbol
--------  ---------  -----------------  ------------------------------------
35.20%    nginx      nginx              [.] http_request_handler
15.80%    nginx      libc-2.31.so       [.] _IO_file_xsputn
10.50%    php-fpm    php-fpm            [.] zend_execute_scripts
 8.10%    nginx      [kernel]           [k] entry_SYSCALL_64
 5.50%    php-fpm    php-fpm            [.] gc_collect_cycles
 3.20%    nginx      nginx              [.] process_client_request

This output is telling you, in real-time, that http_request_handler in nginx is currently the biggest CPU hog, followed by a libc function likely related to writing output, and then a PHP execution function. The kernel syscall entry is also significant.

The core problem perf top solves is the "black box" of CPU utilization. When a system is slow, it’s easy to see that it’s slow, but incredibly hard to see why. Tools like top or htop show overall CPU usage per process, but they don’t reveal which specific code paths within those processes are consuming the CPU. perf top bridges this gap by using hardware performance counters and dynamic sampling to attribute CPU cycles to specific functions and even lines of code.

Internally, perf top leverages the Linux kernel’s perf_events subsystem. This subsystem allows user-space programs like perf to access powerful hardware features on modern CPUs. These features include performance monitoring units (PMUs) that can count events like CPU cycles, instructions retired, cache misses, and branch mispredictions. perf top primarily uses the "CPU cycles" event. It periodically samples the instruction pointer (program counter) of the currently running thread. By aggregating these samples over time, it can statistically determine which code locations are being executed most frequently, and thus, which are consuming the most CPU time. The -p <pid> option allows you to focus on a specific process, and -e cycles (which is often the default) specifies the event.

The power lies in its interactivity and detail. You can press h for help, d to change the delay between updates, f to change the sorting field, and a to switch between different aggregation modes (like showing just kernel code, user code, or both). Pressing Enter on a specific symbol will often drill down to show you the source code with line-by-line attribution, if debug symbols are available. This is invaluable for identifying bottlenecks within your own applications. For instance, if zend_execute_scripts is high, you might then press Enter on it to see which specific PHP script and function within that script is causing the slowdown.

A common configuration for web servers or application servers might involve running perf top with the -p flag targeting the main application process IDs. For example, if your nginx worker processes are PIDs 1234 and 1235, you’d run sudo perf top -p 1234,1235. If you want to see kernel activity related to network I/O, you could use sudo perf top -e context-switches or sudo perf top -e page-faults.

One aspect that often surprises people is how perf top can reveal performance issues not directly tied to the application’s "business logic." For example, seeing a high percentage attributed to __GI___libc_write or poll might indicate inefficient I/O patterns, excessive logging, or contention on file descriptors, rather than a slow algorithm. These are often easier to fix by changing how data is buffered or how many connections are managed, rather than rewriting complex code.

The next step after identifying a hot function with perf top is often to use perf record to capture a more detailed trace of events for later analysis with perf report.

Want structured learning?

Take the full Perf course →