Caching is often thought of as a way to speed things up by storing frequently accessed data closer to the user. The surprising truth is that effective caching is often about reducing contention on downstream systems, not just making data available faster.

Let’s see how this plays out. Imagine a popular e-commerce site. When a user views a product page, the system needs to fetch product details, inventory levels, and maybe even related items. Without caching, each request might hit the database, pulling the same data over and over.

Here’s a simplified look at the data flow for a product request:

User Request -> Web Server -> Application Logic -> Database

If we introduce in-memory caching within the application itself, we can store frequently accessed product data directly in RAM.

User Request -> Web Server -> In-Memory Cache (Hit!) -> Application Logic (Data readily available)

If the data isn’t in the cache (a cache miss):

User Request -> Web Server -> In-Memory Cache (Miss) -> Application Logic -> Database -> In-Memory Cache (Store data) -> Application Logic (Data now available)

This immediately reduces the load on the database. Instead of thousands of requests for the same product details hitting the DB, only the first few (or those after a cache invalidation) will.

Next, consider CDN (Content Delivery Network) caching. This is for static assets like images, CSS, and JavaScript, but also increasingly for API responses. A CDN distributes copies of your content across geographically dispersed servers.

User Request (e.g., from London) -> Nearest CDN Edge Server (Hit!) -> Serve cached image/data

If the CDN doesn’t have it:

User Request (e.g., from London) -> Nearest CDN Edge Server (Miss) -> Fetch from Origin Server -> Serve to User -> CDN Edge Server (Store data for future requests from London)

This offloads traffic from your origin servers and dramatically reduces latency for users far from your data center. It also means your application servers spend less time serving static files.

Finally, database caching. This can happen at multiple levels:

  • Database Query Cache: Some databases (like older MySQL versions) have a built-in query cache that stores the results of identical SELECT statements.
  • Object-Relational Mapper (ORM) Caching: Many ORMs provide caching mechanisms, often integrated with in-memory stores like Redis or Memcached, to cache the objects retrieved from the database.
  • Database-Level Caching (Buffer Pools): Databases themselves maintain memory buffers to cache data pages that have been recently read from disk. This is often managed automatically by the database engine.

Let’s look at a scenario where we’re using Redis as an external cache for application data, often in conjunction with an ORM.

// Application code (conceptual)
function getProduct(productId) {
  const cachedProduct = redisClient.get(`product:${productId}`);
  if (cachedProduct) {
    return JSON.parse(cachedProduct); // Faster, data from Redis
  }

  const product = database.query('SELECT * FROM products WHERE id = ?', [productId]);
  redisClient.set(`product:${productId}`, JSON.stringify(product), 'EX', 3600); // Cache for 1 hour
  return product; // Slower, data from DB
}

In this example, redisClient is connected to a Redis instance. When getProduct is called, it first checks Redis. If the product data is there, it’s returned almost instantly. If not, the application queries the database, then stores the result in Redis before returning it.

The mental model is a series of progressively faster and closer data access layers. The furthest and slowest is the database on disk. The next step up is the database’s in-memory buffer pool. Then, external caches like Redis or Memcached. After that, application-level in-memory caches. Finally, the fastest is the CDN for static assets or cacheable API responses served from edge locations.

The key insight is that each layer is designed to shield the layers below it. The CDN shields your origin servers from static asset requests. In-memory caches shield your database from repeated data fetches. The database’s internal buffering shields the disk I/O subsystem. When you’re debugging a performance issue, understanding which layer is being hammered is crucial. Is your database CPU maxed out? It’s likely your application caches are ineffective or your queries are inefficient. Are your web servers struggling to serve static files? Your CDN might not be configured correctly or is experiencing a high miss rate.

A common pitfall is over-caching or caching stale data. Cache invalidation is the hard problem. If a product price changes, you need a mechanism to ensure that all caches holding the old price are updated or expired. This can involve explicit cache clearing on write operations, or using time-to-live (TTL) values that are short enough to minimize user exposure to stale data. For instance, a product price might have a shorter TTL (e.g., 5 minutes) than product description text (e.g., 1 hour).

The next step in optimizing data delivery often involves looking at how these caching layers interact and considering more advanced strategies like cache warming or using specialized caching databases.

Want structured learning?

Take the full Performance Engineering course →