Rust applications are surprisingly hard to profile with perf, and it’s not because Rust itself is inherently slow to profile.
Let’s see perf in action. Imagine we have a simple Rust program that just spins in a loop:
fn main() {
let mut x = 0;
for _ in 0..1_000_000_000 {
x = x.wrapping_add(1);
}
println!("{}", x);
}
If we try to profile this with perf record ./my_rust_app, we’ll see a lot of [unknown] symbols in the output, making it impossible to tell where in our Rust code the time is being spent.
The core problem is that perf relies on DWARF debugging information to map machine code addresses back to source code symbols. Rust’s default build profiles (even in debug mode) often strip or heavily optimize this information, especially for release builds. Furthermore, Rust’s internal representation of functions and their arguments can be complex and not directly mappable to the simpler C ABI that perf often expects.
Here’s how to get perf to actually work for Rust:
-
Ensure Debug Symbols are Generated: The most crucial step is telling Rust to include debugging symbols. This is controlled by the
debugprofile setting inCargo.tomlor via command-line flags.-
Diagnosis: Run
cargo build --releaseand then tryperf record -g --call-graph dwarf ./my_rust_app. If you still see[unknown]for most of your Rust functions, debug symbols are likely missing or insufficient. -
Fix: For a release build, add this to your
Cargo.toml:[profile.release] debug = trueThen, rebuild with
cargo build --release. For a debug build, symbols are usually present, butperfmight still struggle without explicit instruction. -
Why it works:
debug = trueinstructsrustcto emit DWARF debugging information, whichperfuses to correlate execution addresses with function names and source code locations.
-
-
Use
perf recordwith Call Graph Support:perfneeds to be explicitly told to record call graph information, and which method to use.-
Diagnosis: If you’ve ensured debug symbols but still see flat profiles, you might not be recording call graphs.
-
Fix: Run
perf record -g --call-graph dwarf ./my_rust_app. The-gflag enables call graph recording, and--call-graph dwarfspecifically tellsperfto use the DWARF information for this. -
Why it works: This instructs
perfto trace function calls and returns, allowing it to reconstruct the execution path and attribute time spent in functions to their callers.dwarfspecifies the method for unwinding the stack.
-
-
Consider
perf’s Stack Unwinding Limitations: Even with debug symbols,perf’s stack unwinding (figuring out what called what) can sometimes be imperfect, especially with highly optimized Rust code or complex asynchronous patterns.-
Diagnosis: If
perf reportshows some functions but misses others, or has incorrect call chains, unwinding might be the issue. -
Fix: You can try different unwinding methods with
perf. Whiledwarfis usually best,lbr(Last Branch Record) can sometimes be more robust for certain scenarios, though it might not capture full call chains. Experiment withperf record -g --call-graph lbr ./my_rust_app. For deeper issues, you might need to inspect Rust’sunwindcrate or consider alternative profilers. -
Why it works: Different unwinding mechanisms have different strengths and weaknesses.
dwarfrelies on debug info, whilelbruses hardware features to track branch execution, which can sometimes be more accurate for very optimized code.
-
-
Profile Individual Binaries, Not Just
cargo run: When you usecargo run,cargoitself is a process that invokesrustcand then executes your binary. Profilingcargo runmight attribute time tocargoorrustcinstead of your application.-
Diagnosis: If your
perf reportshows a significant amount of time spent incargoorrustcprocesses, you’re likely profiling the wrong thing. -
Fix: Build your application first (
cargo build --release), then profile the resulting binary directly:perf record -g --call-graph dwarf target/release/my_rust_app. -
Why it works: This ensures
perfis only observing the execution of your compiled Rust code, not the build tools.
-
-
Handle Rust’s Symbol Mangling: Rust mangles function names to support features like monomorphization (creating specialized versions of generic functions for different types).
perfneeds to demangle these names to show human-readable symbols.-
Diagnosis: If
perf reportshows symbols like_ZN12my_crate15my_function30h3946723366908096369Einstead ofmy_crate::my_function, symbol demangling is failing. -
Fix:
perfusually attempts demangling automatically if it can find the necessary symbols. Ensure your binary is linked with debug symbols (Step 1). If it’s still an issue, you might need to ensurelibdw(part ofdwarfutilsor similar packages) is installed on your system, asperfuses it for demangling. A common way to ensure this is to installdwarvesorelfutils-libdw-devon Debian/Ubuntu-like systems. -
Why it works: Demangling converts Rust’s internal, complex symbol names into their original, readable form, making the profiling output understandable.
-
-
Profile with
-C codegen-units=1for Simpler Code: Rust’s default release builds use multiple codegen units for faster compilation. This can sometimes make profiling harder because the compiler might split functions or optimize across units in ways that confuseperf.-
Diagnosis: If you’re still seeing fragmented or incomplete profiling data after the above steps, especially for performance-critical sections, codegen units could be a factor.
-
Fix: Build with
cargo build --release -Z build-std=std,panic_abort -Z codegen-units=1. Note:-Z build-stdis often needed for full release builds with custom codegen units. -
Why it works: Setting
codegen-units=1tellsrustcto compile the entire crate as a single unit. This allows for more aggressive cross-function optimization but results in a simpler, more contiguous machine code layout thatperfcan often unwind more reliably.
-
Once these steps are taken, perf report should show you a breakdown of where your Rust application is spending its CPU time, with clear function names and source code locations. The next hurdle you’ll likely encounter is understanding how to interpret the data for complex Rust patterns like async/await or closures, which perf might present in slightly less intuitive ways than traditional C code.