perf report is the interactive viewer for perf data.

Let’s see it in action. Imagine we’ve profiled a Python script that’s supposed to be crunching numbers, and it’s running slow. We’ve run perf record -g -- python my_script.py and now we have perf.data in our directory.

To start exploring, we just run:

perf report

This drops us into an interactive TUI. The default view shows us a sorted list of functions, with the percentage of CPU time spent in each.

# Overhead  Command     Shared Object     Symbol
# ........  .........   ................  ................
#  75.10%   python      python            my_function
#  15.50%   python      libc-2.31.so      __GI___libc_pwrite
#   5.00%   python      python            <unknown>
#   3.00%   python      libc-2.31.so      PyObject_CallMethod
#   1.40%   python      libc-2.31.so      _PyEval_EvalFrameDefault

This immediately tells us that my_function in our Python script is consuming the vast majority of the CPU time. The columns are pretty self-explanatory: Overhead is the percentage of total samples, Command is the process name, Shared Object is the library or executable the symbol belongs to, and Symbol is the function name.

The real power comes from navigation.

  • Enter: When you press Enter on a line, perf report expands it to show the call chain. If we press Enter on my_function, we might see something like this:

    # Overhead  Command     Shared Object     Symbol
    # ........  .........   ................  ................
    #  75.10%   python      python            my_function
    #   |--70.00%   python      python            _some_internal_func
    #   |--20.00%   python      python            another_helper
    #   `--10.00%   python      libc-2.31.so      PyObject_CallMethod
    

    This shows us what functions called my_function or, if we’re looking at a system library, what user-space code was executing just before it.

  • a (annotate): This is where things get really granular. Pressing a on a symbol, like my_function, opens an annotated source code view. This view interleaves assembly instructions with the percentage of samples that hit that specific instruction. If perf has access to debug symbols and source code, it will even show you the corresponding C or Python lines.

    my_script.py:15:
    # ... previous lines ...
    15:     result = calculate_sum(data)
              0.10%  my_function
    my_script.py:16:
    16:     return result
              0.05%  my_function
    # ... next lines ...
    
    # Assembler:
    ...
    0x55555555e0f0 <my_function+0x20>:  mov    %rax,%rsi
    0x55555555e0f3 <my_function+0x23>:  callq  0x55555555e100 <calculate_sum>  <-- 5.50% overhead here
    0x55555555e0f8 <my_function+0x28>:  mov    %rax,%rbx
    ...
    

    This annotation is crucial. It pinpoints which specific lines or which assembly instructions within a function are consuming the most time. You can see that the callq to calculate_sum is responsible for 5.50% of the samples within my_function.

  • s (sort by): You can change how the data is sorted. For example, pressing s and then typing symbol will sort by symbol name. Pressing s and then overhead (which is the default) sorts by the percentage of samples.

  • f (filter): This is incredibly useful for narrowing down the view. You can filter by command name, symbol name, or even shared object. For instance, typing f and then python will show only events from the python executable. Typing f and then my_function will show only events related to that specific function and its call chain.

  • d (disassemble): If you’re not seeing source code, or if you want to dive into the assembly directly, d will show you the disassembled code for the selected symbol.

  • / (search): Standard search functionality. Useful for finding specific symbols or code patterns.

The -g flag used with perf record is essential here, as it enables call graph (or call chain) recording. Without it, perf report would show you where time is spent, but not why – it wouldn’t show you the context of how those functions were called.

The "Symbol" column can sometimes show <unknown> or <not found>. This typically means that perf couldn’t find debug information for that part of the code. For system libraries, this might be expected if you don’t have debug symbols installed. For your own code, it usually means the binary was stripped or compiled without debug flags (-g for GCC/Clang).

When you see a significant overhead in a Python function like my_function, and then drilling down via Enter or a shows calls into C functions like PyObject_CallMethod or _PyEval_EvalFrameDefault, it’s a strong indicator that the bottleneck isn’t just your Python logic, but how Python itself is executing that logic. This might point to excessive function calls, complex data structures, or inefficient use of Python’s internal mechanisms.

The most surprising thing about perf report is how deeply it can cut into your code’s execution without needing invasive instrumentation. It operates at a very low level, sampling the instruction pointer, and can reconstruct the execution path with remarkable fidelity. This means you can profile performance-critical code in production environments with minimal impact, and gain insights that traditional logging or print statements would never reveal.

The next thing you’ll often run into is understanding what to do when perf report points to a system library or a C function you don’t own, and how to correlate that back to your high-level code.

Want structured learning?

Take the full Perf course →