tracemalloc and Pympler are two of the most powerful tools in Python for understanding and optimizing memory usage, but they operate on fundamentally different principles, leading to distinct strengths and weaknesses. The most surprising thing about tracemalloc is that it doesn’t actually tell you what is using memory, but rather where the memory was allocated.
Let’s see tracemalloc in action. Imagine a simple script that builds a large list:
import tracemalloc
import time
def create_large_list():
data = []
for i in range(100_000):
data.append("a" * 100)
return data
if __name__ == "__main__":
tracemalloc.start()
# Simulate some work
start_time = time.time()
my_list = create_large_list()
end_time = time.time()
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
print(f"Script took {end_time - start_time:.2f} seconds.")
print("[ Top 10 memory allocations ]")
for stat in top_stats[:10]:
print(stat)
# Clean up
del my_list
tracemalloc.stop()
When you run this, you’ll see output like this:
Script took 0.05 seconds.
[ Top 10 memory allocations ]
/usr/lib/python3.8/genericpath.py:30: size=10.0 MiB, count=1, average=10.0 MiB
/usr/lib/python3.8/codecs.py:320: size=10.0 MiB, count=1, average=10.0 MiB
/home/user/your_script.py:7: size=9.5 MiB, count=100000, average=99.8 KiB
/usr/lib/python3.8/linecache.py:137: size=5.0 MiB, count=100000, average=50.0 KiB
/usr/lib/python3.8/encodings/ascii.py:26: size=2.5 MiB, count=100000, average=25.0 KiB
/usr/lib/python3.8/encodings/base.py:36: size=2.5 MiB, count=100000, average=25.0 KiB
/usr/lib/python3.8/io.py:188: size=2.0 MiB, count=100000, average=20.0 KiB
/usr/lib/python3.8/linecache.py:120: size=1.0 MiB, count=100000, average=10.0 KiB
/usr/lib/python3.8/encodings/utf_8.py:14: size=1.0 MiB, count=100000, average=10.0 KiB
/usr/lib/python3.8/encodings/__init__.py:31: size=1.0 MiB, count=100000, average=10.0 KiB
Notice how tracemalloc points to your_script.py:7 as a major allocation source. This is its superpower: it tells you exactly which line of code triggered an allocation. It does this by instrumenting Python’s memory allocation functions to record the traceback for each allocation. When you take a snapshot, it aggregates these tracebacks and shows you the total memory allocated by code originating from specific lines.
Now, let’s consider Pympler. Pympler takes a different approach. Instead of tracking allocations as they happen, it inspects the current state of the Python interpreter’s memory. It walks the object graph, identifying all active objects and their sizes.
Here’s a Pympler example for the same scenario:
from pympler import classtracker, muppy
import sys
def create_large_list():
data = []
for i in range(100_000):
data.append("a" * 100)
return data
if __name__ == "__main__":
# Track class instantiations (optional, but useful for Pympler)
tracker = classtracker.ClassTracker()
tracker.track_class(list)
tracker.track_class(str)
tracker.start()
my_list = create_large_list()
# Get all objects in memory
all_objects = muppy.get_objects()
# Filter for lists and strings
lists = [obj for obj in all_objects if isinstance(obj, list)]
strings = [obj for obj in all_objects if isinstance(obj, str)]
print(f"Total number of lists: {len(lists)}")
print(f"Total size of lists: {sum(sys.getsizeof(l) for l in lists)} bytes")
print(f"Total number of strings: {len(strings)}")
print(f"Total size of strings: {sum(sys.getsizeof(s) for s in strings)} bytes")
# You can also get a summary of all objects by type
print("\n--- Object Summary ---")
print(muppy.get_class_total_size(all_objects))
tracker.stop()
Running this might yield output like:
Total number of lists: 1
Total size of lists: 800000 bytes
Total number of strings: 100000
Total size of strings: 9999999 bytes
--- Object Summary ---
<ClassSummary>
str: 100000 instances, 9.54 MiB
list: 1 instances, 0.76 MiB
tuple: 11 instances, 376 bytes
dict: 4 instances, 320 bytes
... (many more small objects)
Pympler’s strength is in providing a high-level view of what types of objects are consuming memory and how many instances of each exist. It’s excellent for identifying large data structures that are no longer referenced but haven’t been garbage collected, or for seeing if you’ve inadvertently created a vast number of small objects.
The core difference lies in their operational model: tracemalloc is an allocation tracer, showing you the origin of memory requests. Pympler is a heap inspector, showing you the state of memory in use.
The one thing most people don’t know is that tracemalloc’s data is inherently tied to Python’s internal memory management. When you see an allocation from, say, /usr/lib/python3.8/linecache.py, it’s not necessarily your code directly allocating memory there; it’s Python’s internal machinery (like linecache for source code loading) that tracemalloc happens to see as the allocation point. This means you sometimes need to dig a bit deeper into the call stack to find the root cause in your application logic.
The next concept you’ll likely explore is how to identify and break reference cycles that prevent garbage collection, often using tools like gc.collect() and gc.get_referrers().