Python’s tracemalloc module is your best friend when debugging memory leaks, but the most surprising thing is how often the solution involves understanding what isn’t a leak, and how tracemalloc helps you see that.
Let’s say you’ve got an application that’s steadily consuming more memory over time, and you suspect a leak. You’ve tried gc.collect() and it doesn’t help. Time to bring in tracemalloc.
First, you need to enable it and start taking snapshots:
import tracemalloc
import time
tracemalloc.start()
# Simulate some work that might leak memory
def create_objects():
data = []
for i in range(1000):
data.append(list(range(1000))) # Create a list of 1000 integers
return data
leaked_data = []
for _ in range(5):
leaked_data.append(create_objects())
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 10**6:.2f}MB -- Peak: {peak / 10**6:.2f}MB")
time.sleep(1)
tracemalloc.stop()
When you run this, you’ll see the current and peak memory usage climb. The get_traced_memory() function gives you two numbers: current is the memory currently allocated by Python, and peak is the highest current value seen since tracemalloc.start() was called.
To actually find the leak, you need to compare snapshots taken at different points in time. The take_snapshot() method is key here.
import tracemalloc
import time
tracemalloc.start()
def create_objects():
data = []
for i in range(1000):
data.append(list(range(1000)))
return data
snapshot1 = tracemalloc.take_snapshot()
leaked_data = []
for _ in range(5):
leaked_data.append(create_objects())
time.sleep(1)
snapshot2 = tracemalloc.take_snapshot()
tracemalloc.stop()
# Compare snapshots
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
print("[ Top 10 differences ]")
for stat in top_stats[:10]:
print(stat)
The compare_to method is where the magic happens. You’re comparing snapshot2 against snapshot1 and sorting by lineno (line number) to see which lines of code are responsible for the increase in memory allocation. The output will show you the size difference and the file and line number where the allocation occurred.
Here’s what a common leak looks like and how tracemalloc reveals it:
Cause 1: Unbounded Global or Class-Level Collections
You’re appending to a list or dictionary that lives forever, and you never remove items.
- Diagnosis: Run the
tracemallocsnapshot comparison. You’ll see a large increase in allocated memory attributed to a specific line where you’re appending to a global list or a class attribute that’s never cleared. For example, if you haveglobal_list = []and laterglobal_list.append(new_item),tracemallocwill point to thatappendline. - Fix: Implement a strategy to clear or prune the collection. If it’s a cache, use a
maxsizeand an eviction policy (like LRU). If it’s a log, clear it periodically. For instance, ifglobal_listis growing too large:
This works by explicitly limiting the size of theMAX_GLOBAL_LIST_SIZE = 10000 if len(global_list) > MAX_GLOBAL_LIST_SIZE: global_list = global_list[len(global_list) - MAX_GLOBAL_LIST_SIZE:] # Keep only the last N items global_list.append(new_item)global_list, preventing it from growing indefinitely. - Why it works: You’re preventing the collection from accumulating an unbounded number of references, which is the root cause of the leak.
Cause 2: Circular References (with __del__ methods)
While Python’s garbage collector (GC) is good at handling most circular references, they can become problematic if objects involved have __del__ methods. The GC can’t safely collect objects with __del__ because it doesn’t know the order in which to call them.
- Diagnosis:
tracemallocmight show increased memory for the classes involved. You’d then usegc.collect()and checkgc.garbageto see if objects are ending up there. If objects with__del__are ingc.garbage, that’s your culprit. - Fix: Break the cycle before the objects are no longer needed. This often involves explicitly setting attributes to
Noneor calling a cleanup method on one of the objects in the cycle.
By settingclass Parent: def __init__(self): self.child = None def __del__(self): print("Parent dying") class Child: def __init__(self): self.parent = None def __del__(self): print("Child dying") p = Parent() c = Child() p.child = c c.parent = p # To break the cycle before deletion: p.child = None c.parent = None del p del cp.childandc.parenttoNone, you break the strong reference cycle, allowing the GC to reclaim the memory. - Why it works: You’re manually assisting the garbage collector by removing the conditions (the circular references involving
__del__) that prevent it from cleaning up the objects.
Cause 3: Long-Lived Objects Holding References to Short-Lived Objects
A common pattern is a cache or a persistent object that holds references to data that should be temporary.
- Diagnosis:
tracemallocwill show memory growth, and thecompare_tooutput will point to the line where you’re adding items to the long-lived object (e.g., your cache). The traceback might show the allocation happening within a function that’s called frequently, but the storage is in a persistent object. - Fix: Ensure the long-lived object only holds references to data that is intended to be long-lived. If temporary data needs to be stored, use a mechanism that automatically purges it, like
weakref.WeakValueDictionaryor a custom cache with an eviction policy.
Thisfrom collections import UserDict import weakref class LRUCache(UserDict): def __init__(self, capacity: int): super().__init__() self.capacity = capacity self._keys_order = [] # To maintain LRU order def __setitem__(self, key, value): if key in self.data: self._keys_order.remove(key) elif len(self.data) >= self.capacity: lru_key = self._keys_order.pop(0) del self.data[lru_key] self.data[key] = value self._keys_order.append(key) def __getitem__(self, key): self._keys_order.remove(key) self._keys_order.append(key) return self.data[key] my_cache = LRUCache(capacity=100) # Cache only holds 100 items # ... later ... my_cache[new_key] = new_value # If cache is full, oldest item is removedLRUCacheautomatically discards the least recently used item when the capacity is exceeded, preventing unbounded growth. - Why it works: You’re replacing an unbounded collection with a bounded one, ensuring that memory used by temporary data is eventually released.
Cause 4: Generators Not Being Consumed
If you have a generator that produces a large amount of data and you don’t fully iterate over it, the generator object itself might keep references to its internal state, potentially holding onto large data structures.
- Diagnosis:
tracemallocmight show memory growth associated with generator objects. The traceback could point to theyieldstatement or the code that creates the generator. - Fix: Ensure that all generators are fully consumed, or if you only need a subset of the data, use slicing or
itertools.isliceto limit consumption. If you don’t need the generator’s results, explicitly close it or let it go out of scope.
Usingimport itertools def large_data_generator(): for i in range(1000000): yield list(range(1000)) # Yielding large lists # Instead of: # gen = large_data_generator() # process_some_data(next(gen)) # Consume only what's needed: gen = large_data_generator() for item in itertools.islice(gen, 10): # Process only the first 10 items pass # Do something with item # Or if you don't need any results: gen = large_data_generator() del gen # Explicitly delete the generatoritertools.isliceallows you to process only a specific number of items from the generator without materializing the entire sequence in memory. - Why it works: By consuming only the necessary parts of the generator’s output, you prevent the generator from holding onto references to potentially massive intermediate data structures.
Cause 5: External Libraries and C Extensions
Sometimes, the leak isn’t in your Python code but in a C extension or an external library you’re using.
- Diagnosis:
tracemallocwill point to lines within the C extension’s module. If the output shows significant memory allocation originating fromnumpy,pandas,tensorflow, or other compiled libraries, investigate their usage patterns. - Fix: Consult the documentation for the specific library. Often, there are specific ways to manage memory or release resources within those libraries. For example, in NumPy, using
np.deletemight not immediately free memory if the underlying buffer is still referenced elsewhere. Ensure you’re not holding onto old arrays when you expect them to be garbage collected.
The key is to understand how the library manages its memory. For NumPy, explicitimport numpy as np # Poor practice: repeatedly creating large arrays and holding references data_holders = [] for _ in range(100): large_array = np.arange(10**7).reshape(10000, 1000) data_holders.append(large_array) # If you don't intend to keep large_array, ensure it's not referenced # del large_array # This might not free memory immediately if underlying data is shared # Better: manage references carefully or use library-specific cleanup # If possible, use views or slices instead of copying data # For libraries with explicit resource management, call their cleanup functionsdelon a variable doesn’t guarantee immediate memory release if the underlying data buffer is still referenced by other objects or internal structures. - Why it works: You’re aligning your usage of the external library with its memory management characteristics, preventing unintended retention of resources.
Cause 6: Incorrect Use of Caching Decorators
Decorators like functools.lru_cache are powerful but can cause leaks if not understood.
- Diagnosis:
tracemallocwill point to the decorated function. The cache itself grows unboundedly. Thecache_info()on the decorated function will show acurrsizethat keeps increasing. - Fix: Set a
maxsizefor thelru_cachedecorator.
By specifyingfrom functools import lru_cache import time @lru_cache(maxsize=128) # Limit cache to 128 most recent calls def expensive_calculation(x, y): time.sleep(0.1) # Simulate work return x + y for i in range(200): expensive_calculation(i % 10, i % 5) # Calls will eventually evict older entries if i % 10 == 0: print(f"Cache info: {expensive_calculation.cache_info()}")maxsize=128, you ensure that the cache will automatically discard the least recently used results when it exceeds 128 entries, preventing it from growing infinitely. - Why it works: You’re explicitly bounding the size of the cache, ensuring that memory is freed as new results are added.
After fixing your leak, the next error you’ll likely encounter is a RecursionError if your code was relying on the leaked memory to maintain deep call stacks that are now being properly managed.