Rust applications are surprisingly hard to profile with perf, and it’s not because Rust itself is inherently slow to profile.

Let’s see perf in action. Imagine we have a simple Rust program that just spins in a loop:

fn main() {
    let mut x = 0;
    for _ in 0..1_000_000_000 {
        x = x.wrapping_add(1);
    }
    println!("{}", x);
}

If we try to profile this with perf record ./my_rust_app, we’ll see a lot of [unknown] symbols in the output, making it impossible to tell where in our Rust code the time is being spent.

The core problem is that perf relies on DWARF debugging information to map machine code addresses back to source code symbols. Rust’s default build profiles (even in debug mode) often strip or heavily optimize this information, especially for release builds. Furthermore, Rust’s internal representation of functions and their arguments can be complex and not directly mappable to the simpler C ABI that perf often expects.

Here’s how to get perf to actually work for Rust:

  1. Ensure Debug Symbols are Generated: The most crucial step is telling Rust to include debugging symbols. This is controlled by the debug profile setting in Cargo.toml or via command-line flags.

    • Diagnosis: Run cargo build --release and then try perf record -g --call-graph dwarf ./my_rust_app. If you still see [unknown] for most of your Rust functions, debug symbols are likely missing or insufficient.

    • Fix: For a release build, add this to your Cargo.toml:

      [profile.release]
      debug = true
      

      Then, rebuild with cargo build --release. For a debug build, symbols are usually present, but perf might still struggle without explicit instruction.

    • Why it works: debug = true instructs rustc to emit DWARF debugging information, which perf uses to correlate execution addresses with function names and source code locations.

  2. Use perf record with Call Graph Support: perf needs to be explicitly told to record call graph information, and which method to use.

    • Diagnosis: If you’ve ensured debug symbols but still see flat profiles, you might not be recording call graphs.

    • Fix: Run perf record -g --call-graph dwarf ./my_rust_app. The -g flag enables call graph recording, and --call-graph dwarf specifically tells perf to use the DWARF information for this.

    • Why it works: This instructs perf to trace function calls and returns, allowing it to reconstruct the execution path and attribute time spent in functions to their callers. dwarf specifies the method for unwinding the stack.

  3. Consider perf’s Stack Unwinding Limitations: Even with debug symbols, perf’s stack unwinding (figuring out what called what) can sometimes be imperfect, especially with highly optimized Rust code or complex asynchronous patterns.

    • Diagnosis: If perf report shows some functions but misses others, or has incorrect call chains, unwinding might be the issue.

    • Fix: You can try different unwinding methods with perf. While dwarf is usually best, lbr (Last Branch Record) can sometimes be more robust for certain scenarios, though it might not capture full call chains. Experiment with perf record -g --call-graph lbr ./my_rust_app. For deeper issues, you might need to inspect Rust’s unwind crate or consider alternative profilers.

    • Why it works: Different unwinding mechanisms have different strengths and weaknesses. dwarf relies on debug info, while lbr uses hardware features to track branch execution, which can sometimes be more accurate for very optimized code.

  4. Profile Individual Binaries, Not Just cargo run: When you use cargo run, cargo itself is a process that invokes rustc and then executes your binary. Profiling cargo run might attribute time to cargo or rustc instead of your application.

    • Diagnosis: If your perf report shows a significant amount of time spent in cargo or rustc processes, you’re likely profiling the wrong thing.

    • Fix: Build your application first (cargo build --release), then profile the resulting binary directly: perf record -g --call-graph dwarf target/release/my_rust_app.

    • Why it works: This ensures perf is only observing the execution of your compiled Rust code, not the build tools.

  5. Handle Rust’s Symbol Mangling: Rust mangles function names to support features like monomorphization (creating specialized versions of generic functions for different types). perf needs to demangle these names to show human-readable symbols.

    • Diagnosis: If perf report shows symbols like _ZN12my_crate15my_function30h3946723366908096369E instead of my_crate::my_function, symbol demangling is failing.

    • Fix: perf usually attempts demangling automatically if it can find the necessary symbols. Ensure your binary is linked with debug symbols (Step 1). If it’s still an issue, you might need to ensure libdw (part of dwarfutils or similar packages) is installed on your system, as perf uses it for demangling. A common way to ensure this is to install dwarves or elfutils-libdw-dev on Debian/Ubuntu-like systems.

    • Why it works: Demangling converts Rust’s internal, complex symbol names into their original, readable form, making the profiling output understandable.

  6. Profile with -C codegen-units=1 for Simpler Code: Rust’s default release builds use multiple codegen units for faster compilation. This can sometimes make profiling harder because the compiler might split functions or optimize across units in ways that confuse perf.

    • Diagnosis: If you’re still seeing fragmented or incomplete profiling data after the above steps, especially for performance-critical sections, codegen units could be a factor.

    • Fix: Build with cargo build --release -Z build-std=std,panic_abort -Z codegen-units=1. Note: -Z build-std is often needed for full release builds with custom codegen units.

    • Why it works: Setting codegen-units=1 tells rustc to compile the entire crate as a single unit. This allows for more aggressive cross-function optimization but results in a simpler, more contiguous machine code layout that perf can often unwind more reliably.

Once these steps are taken, perf report should show you a breakdown of where your Rust application is spending its CPU time, with clear function names and source code locations. The next hurdle you’ll likely encounter is understanding how to interpret the data for complex Rust patterns like async/await or closures, which perf might present in slightly less intuitive ways than traditional C code.

Want structured learning?

Take the full Perf course →