Flamegraphs are a way to visualize the performance of your Rust program, showing you where it’s spending most of its time.

Let’s see it in action. Imagine you have a simple Rust program that does some computation:

fn main() {
    let mut data = vec![1, 2, 3, 4, 5];
    process_data(&mut data);
    println!("Done!");
}

fn process_data(data: &mut Vec<i32>) {
    for _ in 0..1000000 {
        for item in data.iter_mut() {
            *item *= 2;
        }
    }
}

To profile this, we’ll use the perf tool on Linux and flamegraph crate.

First, ensure you have perf installed. On Ubuntu/Debian: sudo apt-get install linux-tools-common linux-tools-$(uname -r)

Next, build your Rust program with debug symbols, which are crucial for perf to map performance data back to your source code. cargo build --release

Now, run your program under perf and record the performance data. We’ll sample on CPU cycles. sudo perf record -g -F 99 --call-graph dwarf target/release/<your_binary_name>

Here, -g enables call graph recording, -F 99 sets the sampling frequency to 99 Hz, and --call-graph dwarf tells perf to use DWARF debug information for call stacks. Replace <your_binary_name> with the actual name of your executable in target/release/.

After perf finishes, it will create a perf.data file. We then use the flamegraph tool to process this data and generate the SVG. You’ll need to add flamegraph to your Cargo.toml (though it’s used as a command-line tool here, so it’s not strictly necessary to add as a dependency, but good practice).

Install flamegraph from crates.io: cargo install flamegraph

Now, generate the flamegraph: flamegraph target/release/<your_binary_name> perf.data

This will create a file named flamegraph.svg in your current directory. Open this SVG in a web browser. You’ll see a graphical representation of your program’s execution. The wider a block is, the more time your program spent in that function. The blocks are stacked, so a wider block on top of another means the top function was called by the bottom one.

The core problem flamegraphs solve is that profiling tools often give you raw numbers (like "function X took 10ms"). This is hard to interpret when you have thousands of functions and complex call chains. Flamegraphs give you an intuitive, visual overview. You can zoom into areas of interest by clicking on them in the SVG. The orientation is important: the width represents time spent, and the stacking represents the call hierarchy. The function at the bottom of a stack is the caller, and the functions above it are its callees.

The most surprising thing is how often the "obvious" performance bottlenecks are not where the flamegraph points. You might suspect a complex algorithm is slow, but a flamegraph could reveal that a simple, frequently called helper function, or even a standard library function you didn’t think about, is dominating execution time due to its sheer call count. It’s the product of time per call and number of calls that matters, and flamegraphs visualize this product effectively.

The perf record command has many options. For instance, -e cycles explicitly tells perf to sample on CPU cycles, which is often a good default for CPU-bound work. If your program is I/O bound, you might look at different event types like page-faults or context-switches. The dwarf option for --call-graph is crucial for Rust because it relies on DWARF debugging information. Without it, perf might not be able to reconstruct accurate call stacks, leading to a less useful flamegraph.

The real power comes from analyzing the visual output. Look for wide, flat stacks of functions. These indicate a lot of time spent in a particular call chain. If a wide bar is entirely composed of a single function, that function is likely a major hotspot. If it’s composed of many smaller functions stacked on top of each other, you need to investigate the call chain to understand where the time is being spent.

The next step in performance analysis is often understanding memory access patterns, which flamegraphs can hint at but don’t directly visualize.

Want structured learning?

Take the full Rust course →