Python lru_cache: Memoization and Cache Internals (2026)

lru_cache is surprisingly a wrapper around a linked list and a dictionary, not some magical in-memory lookup.

Let’s see it in action. Imagine we have a function that simulates a slow database query.

import time
from functools import lru_cache

def slow_query(user_id: int) -> str:
    print(f"Simulating slow query for user {user_id}...")
    time.sleep(1)  # Simulate network latency/database lookup
    return f"User data for {user_id}"

# Apply LRU cache with a max size of 3
cached_slow_query = lru_cache(maxsize=3)(slow_query)

print("First call for user 1:")
print(cached_slow_query(1))

print("\nSecond call for user 1 (should be cached):")
print(cached_slow_query(1))

print("\nCalling for users 2 and 3:")
print(cached_slow_query(2))
print(cached_slow_query(3))

print("\nCache state after calls for 1, 2, 3:")
print(cached_slow_query.cache_info())

print("\nCalling for user 4 (will evict user 1):")
print(cached_slow_query(4))

print("\nCache state after calling for user 4:")
print(cached_slow_query.cache_info())

print("\nCalling for user 1 again (should miss):")
print(cached_slow_query(1))

print("\nCache state after calling for user 1 again:")
print(cached_slow_query.cache_info())

The output shows that the "Simulating slow query…" message only appears the first time a unique argument combination is passed. Subsequent calls with the same arguments are instantaneous, and cache_info() confirms hits and misses.

The problem lru_cache solves is the repeated, expensive computation of function results when the same inputs are encountered frequently. It’s a form of memoization, where results are stored to avoid recalculating them. This is incredibly useful for functions that are pure (always produce the same output for the same input) and computationally intensive, or that interact with external resources that are slow to access.

Internally, lru_cache uses a combination of a dictionary and a doubly linked list. The dictionary maps the function’s arguments (as a tuple) to the corresponding result and a reference to its node in the linked list. The linked list maintains the order of access. When a cache miss occurs and the cache is full, the least recently used item (the one at the head of the list) is evicted, and its entry is removed from the dictionary. When an item is accessed (either a hit or a new insertion), its corresponding node is moved to the tail of the linked list, marking it as the most recently used.

The maxsize argument is crucial. If set to None, the cache can grow indefinitely, which might lead to memory exhaustion. If maxsize is an integer, it defines the maximum number of unique calls to be stored. When the cache reaches this size and a new call is made, the least recently used entry is discarded to make space. The default maxsize is 128.

You can inspect the cache’s performance using the .cache_info() method. It returns a namedtuple with hits, misses, current_size, and maxsize. This is invaluable for understanding cache effectiveness and identifying potential issues, like a maxsize that’s too small, leading to excessive evictions and misses.

To clear the cache, you can use the .cache_clear() method. This is useful in testing scenarios or when you know the underlying data or computation has changed, invalidating all cached results.

The order of arguments matters for caching. lru_cache uses the tuple of positional and keyword arguments as the cache key. So, cached_func(1, 2) and cached_func(1, b=2) would be treated as different calls if b is a keyword argument name. To ensure consistent caching, it’s often best to define functions with keyword-only arguments where appropriate, or to ensure that arguments are always passed in a predictable order.

The typed=True argument to lru_cache causes arguments of different types but equal values to be cached separately. For example, without typed=True, cached_func(1) and cached_func(1.0) would hit the same cache entry if the function could accept both an integer and a float. With typed=True, they would be distinct entries. This can be useful if type differences imply semantic differences in the computation or its result.

Understanding the eviction policy is key. When maxsize is reached, the LRU (Least Recently Used) item is evicted. This means if you have items A, B, C in the cache (maxsize 3) and access them in that order, then call D, item A will be evicted. If you then call B again, it becomes the most recently used, and if you call E, C will be evicted.

The next hurdle you’ll likely encounter is understanding how lru_cache interacts with mutable arguments, or when invalidation is truly needed beyond a simple cache_clear().