cProfile and py-spy are your go-to tools for understanding where your Python code is spending its time, but they approach the problem from fundamentally different angles.
Let’s see py-spy in action. Imagine you have a web server, maybe built with Flask, that’s suddenly become sluggish. You can attach py-spy to the running process without modifying your code or even restarting it.
# Find the PID of your Flask app (e.g., using ps aux | grep python)
FLASK_PID=12345
# Record a 30-second profile
sudo py-spy record -o profile.svg --pid $FLASK_PID --duration 30
This command attaches py-spy to the process with PID 12345. It samples the call stack every millisecond for 30 seconds and then generates an interactive flame graph (profile.svg). Open this file in your browser, and you’ll see a visual representation of where the CPU time is spent. The wider the bar, the more time spent in that function or its children. You can immediately spot functions that are consuming disproportionate amounts of CPU, allowing you to focus your optimization efforts.
Now, let’s contrast this with cProfile. cProfile is an in-process profiler, meaning it needs to be explicitly run with your Python script. It instruments your code, tracking every function call, return, and the time spent in each.
Consider a script designed to do some heavy computation:
# heavy_computation.py
import time
def process_data():
total = 0
for i in range(10_000_000):
total += i * i
return total
def main():
start_time = time.time()
result = process_data()
end_time = time.time()
print(f"Result: {result}")
print(f"Time taken: {end_time - start_time:.2f} seconds")
if __name__ == "__main__":
main()
To profile this with cProfile, you’d run it from the command line:
python -m cProfile -o output.prof heavy_computation.py
This generates a output.prof file containing raw profiling data. To make sense of it, you’d typically use pstats:
import pstats
from pstats import SortKey
p = pstats.Stats('output.prof')
p.sort_stats(SortKey.CUMULATIVE).print_stats(10)
p.sort_stats(SortKey.TIME).print_stats(10)
This script will print the top 10 functions by cumulative time (total time spent in the function and its callees) and by internal time (time spent only in the function itself). You’d see output like:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 3.112 3.112 heavy_computation.py:4(process_data)
10000000 2.800 0.000 2.800 0.000 {built-in method builtins.int.__mul__}
10000000 0.312 0.000 0.312 0.000 {built-in method builtins.int.__add__}
1 0.000 0.000 3.112 3.112 heavy_computation.py:11(main)
The cProfile data clearly shows that process_data is the main bottleneck, and within it, the int.__mul__ operation is taking the most time.
The core difference lies in their approach: py-spy is an external, sampling profiler. It periodically asks the Python interpreter "what are you doing right now?" without interfering with the code’s execution. This makes it excellent for live systems, C extensions, and situations where you can’t easily modify the code. cProfile is an internal, instrumenting profiler. It modifies the code before execution to meticulously record every single event. This gives you incredibly precise data but requires modifying how you run your script and can have a higher performance overhead.
One aspect often overlooked is how py-spy handles native code. Because it samples the entire process (including Python interpreter internals and C extensions), it can attribute time spent in C libraries or compiled extensions back to the Python functions that called them. This is invaluable for diagnosing performance issues in libraries like NumPy, Pandas, or even your own custom C extensions, where cProfile might only show time spent within the C function itself, making it hard to trace back to the Python origin.
The biggest challenge with py-spy is often getting it installed and running, especially on systems with strict security policies, as it requires root privileges (sudo) to attach to arbitrary processes. You also need to be mindful of the sampling rate; too low and you might miss short-lived but frequent calls, too high and you increase its own overhead.
When you’ve mastered profiling, the next logical step is often understanding memory usage, and tools like memory_profiler and objgraph become your next area of exploration.