Python’s garbage collector doesn’t just "clean up" unused objects; it’s a sophisticated system that meticulously tracks every single reference to an object, and in most cases, that’s all it needs to do.

Let’s see it in action. Imagine this simple script:

import sys

class MyClass:
    def __init__(self, name):
        self.name = name
        print(f"Object '{self.name}' created.")

    def __del__(self):
        print(f"Object '{self.name}' is being deleted.")

obj1 = MyClass("obj1")
print(f"Reference count for obj1: {sys.getrefcount(obj1)}")

obj2 = obj1  # Another reference
print(f"Reference count for obj1 after obj2 assignment: {sys.getrefcount(obj1)}")

del obj1    # Deletes the name 'obj1', not the object itself
print(f"Reference count for obj1 after del obj1: {sys.getrefcount(obj2)}") # Note: sys.getrefcount is off by 1

obj2 = None # Explicitly set to None
print(f"Reference count for obj1 after obj2 = None: {sys.getrefcount(obj2)}") # This will error as obj2 is None

# Let's re-run with a cleaner scope to show the actual deletion
print("\n--- Second Run ---")
def create_and_lose_object():
    local_obj = MyClass("local_obj")
    print(f"Inside function, ref count: {sys.getrefcount(local_obj)}")
    # local_obj goes out of scope here, and its reference count drops to 0.
    # The __del__ method will be called.

create_and_lose_object()
print("--- After function call ---")
# If __del__ was called, we'd see the message here.

When you run this, you’ll notice a few things. The sys.getrefcount() function itself adds a temporary reference when it’s called, which is why the count is often one higher than you might expect. More importantly, the __del__ method isn’t called immediately after del obj1 or when obj2 is set to None. It’s called only when the reference count truly drops to zero. In the create_and_lose_object function, when local_obj goes out of scope, its reference count becomes zero, and __del__ is invoked.

The core mechanism is reference counting. Every object in Python has an associated count of how many variables or data structures are pointing to it. When you create an object, its reference count starts at 1. When you assign that object to another variable, the count increments. When a variable goes out of scope, is reassigned, or explicitly deleted using del, the object’s reference count decrements. If the count reaches zero, Python knows the object is no longer reachable and can reclaim its memory.

However, reference counting alone has a blind spot: cyclic references. Consider this:

class Node:
    def __init__(self, name):
        self.name = name
        self.neighbor = None
        print(f"Node '{self.name}' created.")

    def __del__(self):
        print(f"Node '{self.name}' is being deleted.")

a = Node("A")
b = Node("B")

a.neighbor = b
b.neighbor = a

print(f"Ref count for a: {sys.getrefcount(a)}") # Will be 2 (a itself, and a.neighbor)
print(f"Ref count for b: {sys.getrefcount(b)}") # Will be 2 (b itself, and b.neighbor)

del a
del b
# At this point, the objects are still in memory because their reference counts are not zero!
# a.neighbor points to b, and b.neighbor points to a.
# Python's cyclic garbage collector will eventually detect and clean this up.

Here, a holds a reference to b, and b holds a reference back to a. Even after you del a and del b, the reference count for a is still 1 (from b.neighbor) and the reference count for b is still 1 (from a.neighbor). Reference counting alone would never reclaim these objects, leading to a memory leak.

This is where Python’s cyclic garbage collector comes in. It’s a separate process that runs periodically (or when triggered). It specifically looks for objects that are part of a cycle and are no longer reachable from any external references. It traverses the object graph, identifies these unreachable cycles, and breaks them, allowing the objects within the cycle to be deallocated. The cyclic GC is what prevents the memory leak in the Node example above. You can manually trigger it with gc.collect().

The gc module offers fine-grained control. For instance, gc.disable() completely turns off the cyclic collector, relying solely on reference counting. gc.set_threshold(threshold0=700, threshold1=700, threshold2=700) allows you to tune how often the cyclic collector runs based on the number of object allocations and deallocations. The defaults are usually sensible, but for performance-critical applications with specific memory patterns, tuning these thresholds can sometimes yield benefits.

When Python’s garbage collection happens, it’s not just a simple sweep. For cyclic garbage collection, it involves several phases: a "marking" phase where reachable objects are identified, a "sweeping" phase where unreachable objects are deallocated, and potentially a "finalization" phase for objects with __del__ methods that might themselves trigger further collection. The order of operations and the exact algorithms can be complex, especially when dealing with objects that have custom __del__ methods, as these can sometimes create new cycles or make objects appear reachable when they aren’t.

Understanding how Python manages memory via reference counting and the cyclic collector is crucial for writing efficient and memory-safe code, especially when dealing with complex data structures or long-running applications. The next step in understanding memory management is often exploring how different data structures, like lists and dictionaries, impact reference counts and the potential for creating cycles.

Want structured learning?

Take the full Python course →