perf report is the interactive viewer for perf data.
Let’s see it in action. Imagine we’ve profiled a Python script that’s supposed to be crunching numbers, and it’s running slow. We’ve run perf record -g -- python my_script.py and now we have perf.data in our directory.
To start exploring, we just run:
perf report
This drops us into an interactive TUI. The default view shows us a sorted list of functions, with the percentage of CPU time spent in each.
# Overhead Command Shared Object Symbol
# ........ ......... ................ ................
# 75.10% python python my_function
# 15.50% python libc-2.31.so __GI___libc_pwrite
# 5.00% python python <unknown>
# 3.00% python libc-2.31.so PyObject_CallMethod
# 1.40% python libc-2.31.so _PyEval_EvalFrameDefault
This immediately tells us that my_function in our Python script is consuming the vast majority of the CPU time. The columns are pretty self-explanatory: Overhead is the percentage of total samples, Command is the process name, Shared Object is the library or executable the symbol belongs to, and Symbol is the function name.
The real power comes from navigation.
-
Enter: When you pressEnteron a line,perf reportexpands it to show the call chain. If we pressEnteronmy_function, we might see something like this:# Overhead Command Shared Object Symbol # ........ ......... ................ ................ # 75.10% python python my_function # |--70.00% python python _some_internal_func # |--20.00% python python another_helper # `--10.00% python libc-2.31.so PyObject_CallMethodThis shows us what functions called
my_functionor, if we’re looking at a system library, what user-space code was executing just before it. -
a(annotate): This is where things get really granular. Pressingaon a symbol, likemy_function, opens an annotated source code view. This view interleaves assembly instructions with the percentage of samples that hit that specific instruction. Ifperfhas access to debug symbols and source code, it will even show you the corresponding C or Python lines.my_script.py:15: # ... previous lines ... 15: result = calculate_sum(data) 0.10% my_function my_script.py:16: 16: return result 0.05% my_function # ... next lines ... # Assembler: ... 0x55555555e0f0 <my_function+0x20>: mov %rax,%rsi 0x55555555e0f3 <my_function+0x23>: callq 0x55555555e100 <calculate_sum> <-- 5.50% overhead here 0x55555555e0f8 <my_function+0x28>: mov %rax,%rbx ...This annotation is crucial. It pinpoints which specific lines or which assembly instructions within a function are consuming the most time. You can see that the
callqtocalculate_sumis responsible for 5.50% of the samples withinmy_function. -
s(sort by): You can change how the data is sorted. For example, pressingsand then typingsymbolwill sort by symbol name. Pressingsand thenoverhead(which is the default) sorts by the percentage of samples. -
f(filter): This is incredibly useful for narrowing down the view. You can filter by command name, symbol name, or even shared object. For instance, typingfand thenpythonwill show only events from thepythonexecutable. Typingfand thenmy_functionwill show only events related to that specific function and its call chain. -
d(disassemble): If you’re not seeing source code, or if you want to dive into the assembly directly,dwill show you the disassembled code for the selected symbol. -
/(search): Standard search functionality. Useful for finding specific symbols or code patterns.
The -g flag used with perf record is essential here, as it enables call graph (or call chain) recording. Without it, perf report would show you where time is spent, but not why – it wouldn’t show you the context of how those functions were called.
The "Symbol" column can sometimes show <unknown> or <not found>. This typically means that perf couldn’t find debug information for that part of the code. For system libraries, this might be expected if you don’t have debug symbols installed. For your own code, it usually means the binary was stripped or compiled without debug flags (-g for GCC/Clang).
When you see a significant overhead in a Python function like my_function, and then drilling down via Enter or a shows calls into C functions like PyObject_CallMethod or _PyEval_EvalFrameDefault, it’s a strong indicator that the bottleneck isn’t just your Python logic, but how Python itself is executing that logic. This might point to excessive function calls, complex data structures, or inefficient use of Python’s internal mechanisms.
The most surprising thing about perf report is how deeply it can cut into your code’s execution without needing invasive instrumentation. It operates at a very low level, sampling the instruction pointer, and can reconstruct the execution path with remarkable fidelity. This means you can profile performance-critical code in production environments with minimal impact, and gain insights that traditional logging or print statements would never reveal.
The next thing you’ll often run into is understanding what to do when perf report points to a system library or a C function you don’t own, and how to correlate that back to your high-level code.