Rayon’s parallel iterators don’t just speed up your code by running things concurrently; they actually change the shape of your computation to make it more amenable to parallel execution.

Let’s see this in action. Imagine you have a large vector of numbers and you want to square each one and then sum them up.

fn main() {
    let data: Vec<i32> = (1..=1_000_000).collect();
    
    // Sequential version
    let sequential_sum: i32 = data.iter().map(|x| x * x).sum();
    println!("Sequential sum: {}", sequential_sum);

    // Parallel version
    use rayon::prelude::*;
    let parallel_sum: i32 = data.par_iter().map(|x| x * x).sum();
    println!("Parallel sum: {}", parallel_sum);
}

When you run this, the parallel_sum will likely be significantly faster on a multi-core machine. But how does Rayon achieve this?

The core idea is "work stealing." Rayon doesn’t just divide your data into fixed chunks and assign them to threads. Instead, it creates a pool of worker threads. When a thread finishes its assigned work, it "steals" a portion of the work from another busy thread. This ensures that all threads are kept as busy as possible, minimizing idle time.

When you call .par_iter() on a collection, Rayon converts your standard iterator into a ParallelIterator. This ParallelIterator has methods like map, filter, fold, and sum that are designed to operate in parallel.

The magic happens in how these operations are implemented. For map, Rayon will often split the input collection into smaller chunks. Each thread then processes one or more of these chunks independently. For sum (or fold in general), each thread computes a partial sum for its chunk. Finally, Rayon combines these partial sums into a single, final result. This is a classic divide-and-conquer strategy.

Consider the sum() operation. A sequential sum iterates through every element, adding it to an accumulator. A parallel sum is more complex. Rayon first performs a parallel fold operation. Each thread computes a local sum for its assigned portion of the data. Then, these local sums are combined, again potentially in parallel, until a single final sum is produced. This is crucial because simply having multiple threads try to update a single global sum variable would lead to contention and negate any performance benefits.

The rayon::join function is another powerful tool. It allows you to run two closures in parallel and wait for both to complete. This is useful for recursive algorithms where you can split a problem into two independent sub-problems.

fn process_data(data: &[i32]) {
    if data.len() < 1000 { // Base case: small enough for sequential processing
        for x in data {
            // Do some sequential work
            let _ = x * 2; 
        }
    } else {
        let (left, right) = data.split_at(data.len() / 2);
        rayon::join(|| process_data(left), || process_data(right));
    }
}

This recursive splitting, combined with work stealing, allows Rayon to adapt to varying workloads and core availability.

A common pitfall is expecting Rayon to magically parallelize any operation. If your map or filter closures involve significant synchronization (like locking shared mutable state), you’ll likely see performance degrade because the threads will spend more time waiting for locks than doing actual work. The ideal operations for Rayon are those where each element can be processed independently of others.

What most people don’t realize is how Rayon handles the actual reduction step for operations like sum or collect. It doesn’t just do a simple sequential reduction at the end. Instead, it often employs a tree-like reduction. Imagine each thread produces a partial result. Then, pairs of these partial results are combined. Then pairs of those combined results are combined, and so on. This hierarchical combination is itself parallelizable and dramatically reduces the overhead of the final aggregation.

The next step you’ll likely explore is how to parallelize custom data structures or how to integrate Rayon with asynchronous operations.

Want structured learning?

Take the full Rust course →