Pytest-benchmark lets you measure how fast your code runs, but it’s not just about timing a function; it’s about understanding the variance in that timing.
Let’s see it in action. Imagine you have a simple function that adds two numbers, and you want to benchmark it.
# test_example.py
import pytest
def add_numbers(a, b):
return a + b
def test_addition_performance(benchmark):
result = benchmark(add_numbers, 2, 3)
assert result == 5
When you run pytest --benchmark-enable, pytest-benchmark will execute add_numbers(2, 3) many times, collect timing data, and report it.
--------------------------------- benchmark: 1 tests ---------------------------------
Name (time in ns) Min Max Mean StdDev Rounds Iterations
--------------------------------------------------------------------------------------
test_addition_performance 123.45 234.56 180.00 45.00 10 1000
Here’s the breakdown:
- Min/Max/Mean: These show the fastest, slowest, and average execution times (in nanoseconds, by default) across all the rounds of execution.
- StdDev: Standard deviation. This is crucial. A low StdDev means your function is consistently fast. A high StdDev means its performance is unpredictable, which is often a bigger problem than being simply slow.
- Rounds: The number of times the entire benchmarking process (all iterations within a round) was run. Pytest-benchmark increases this if it needs more data to get a stable measurement.
- Iterations: The number of times your function was called within a single round. Pytest-benchmark automatically adjusts this to get meaningful measurements. If your function is very fast, it will run it many times per round; if it’s slow, it might only run it once.
The core problem pytest-benchmark solves is the difficulty of getting reliable performance metrics. A single run is meaningless due to background processes, cache effects, and other system noise. Pytest-benchmark automates the process of running your code repeatedly under controlled conditions to isolate the performance of the code itself.
You control its behavior with command-line options:
--benchmark-enable: Turns on benchmarking.--benchmark-disable: Turns it off.--benchmark-verbose: Shows more detailed output, including the number of iterations per round.--benchmark-warmup=<n>: Runs your function<n>times before starting measurements to help warm up caches.--benchmark-save=<name>: Saves benchmark results to a file named<name>.json.--benchmark-compare: Compares current results against saved ones.
The most surprising thing about pytest-benchmark is how it handles iterations per round automatically. You don’t tell it "run this 1000 times." Instead, it starts with a default number and if the measured time is too short (e.g., less than 10ms), it doubles the iterations for the next round. If the time is too long (e.g., more than 1 second), it halves them. This dynamic adjustment ensures it gets a meaningful time measurement for functions ranging from nanosecond operations to multi-second tasks without you having to guess the right number of calls.
When you use --benchmark-save and then --benchmark-compare in subsequent runs, you can track performance regressions. If a commit causes your function’s mean execution time to increase significantly, or more importantly, its standard deviation to jump, pytest-benchmark will highlight it, turning a subtle performance bug into a clear failure.
The next hurdle is understanding how to interpret and act on the standard deviation, especially when it’s high even for simple functions.