Database queries that take too long to complete are usually a symptom of one of two things: the database can’t find the data it needs quickly enough, or it can find the data, but the work it has to do with that data is excessive.

Let’s see this in action. Imagine we have a users table and we want to find all users who signed up in the last 7 days and live in California.

SELECT *
FROM users
WHERE signup_date >= NOW() - INTERVAL '7 days'
  AND state = 'CA';

If this query is slow, we first need to figure out why. The most common culprit is a missing index. Without an index on signup_date and state, the database has to scan every single row in the users table to check if it matches the WHERE clause conditions. This is called a full table scan, and it’s terrible for performance on large tables.

Here’s how we find out if a full table scan is happening and how to fix it:

1. Diagnosis: Check the Query Plan

Every database system has a way to show you how it plans to execute a query. In PostgreSQL, it’s EXPLAIN ANALYZE.

EXPLAIN ANALYZE SELECT *
FROM users
WHERE signup_date >= NOW() - INTERVAL '7 days'
  AND state = 'CA';

Look for Seq Scan (Sequential Scan) in the output. If you see Seq Scan on users, that’s your full table scan. It will also show you the total execution time.

2. Common Cause: Missing Indexes

  • Diagnosis: As above, Seq Scan on users in EXPLAIN ANALYZE.
  • Fix: Create indexes on the columns used in your WHERE clause. For this query, a composite index is often best.
    CREATE INDEX idx_users_signup_date_state ON users (signup_date, state);
    
  • Why it works: Indexes are like the index at the back of a book. Instead of reading every page (row), the database can quickly jump to the relevant sections (rows matching signup_date and state) using the sorted index structure.

3. Common Cause: Inefficient Indexes (Wrong Order)

  • Diagnosis: You have indexes, but EXPLAIN ANALYZE still shows a Seq Scan or a plan that doesn’t seem to use your indexes effectively. Sometimes, the order of columns in a composite index matters significantly. If your index is (state, signup_date) and your query heavily filters on signup_date first (though in this specific example it’s treated equally), the database might not use it optimally.
  • Fix: Reorder columns in the composite index to match the query’s filter selectivity or typical usage. For this query, (signup_date, state) is generally good because signup_date often has a higher selectivity (fewer rows qualify) when looking at recent dates than state alone on a large, diverse user base.
    DROP INDEX idx_users_signup_date_state;
    CREATE INDEX idx_users_signup_date_state ON users (signup_date, state);
    
  • Why it works: The database can more efficiently seek through the index if the leading columns are the most selective filters for the query.

4. Common Cause: Outdated Statistics

  • Diagnosis: EXPLAIN ANALYZE shows the query is using an index, but the rows removed by filter is very high, or the actual time is still excessive. The query planner might be making bad decisions because it doesn’t know the current distribution of data.
  • Fix: Update database statistics. In PostgreSQL:
    ANALYZE users;
    
  • Why it works: The query planner uses statistics about the data’s distribution (e.g., how many users are in California, how many signed up today) to choose the most efficient execution plan. Outdated statistics lead to poor plan choices.

5. Common Cause: Poorly Written Queries (Function Calls in WHERE)

  • Diagnosis: EXPLAIN ANALYZE shows a Seq Scan even with indexes present. If you had a query like:
    SELECT *
    FROM users
    WHERE DATE(signup_date) >= NOW() - INTERVAL '7 days'
      AND state = 'CA';
    
    The DATE(signup_date) function prevents the database from using a standard index on signup_date because it has to compute the date for every row before it can compare.
  • Fix: Rewrite the query to avoid applying functions to indexed columns in the WHERE clause.
    SELECT *
    FROM users
    WHERE signup_date >= DATE(NOW() - INTERVAL '7 days')
      AND state = 'CA';
    
    Or, more precisely, ensure the comparison is on the raw column:
    SELECT *
    FROM users
    WHERE signup_date >= (CURRENT_DATE - INTERVAL '7 days')
      AND state = 'CA';
    
  • Why it works: By comparing the indexed column directly to a calculated value, the database can use the index for a "range scan" or "index seek."

6. Common Cause: Large SELECT * or Unnecessary Data Retrieval

  • Diagnosis: EXPLAIN ANALYZE shows good index usage, but the execution time is still high, and the Buffers: shared hit vs. read shows a lot of disk reads, or the overall rows returned is huge.
  • Fix: Select only the columns you need, not *.
    SELECT user_id, username, email
    FROM users
    WHERE signup_date >= NOW() - INTERVAL '7 days'
      AND state = 'CA';
    
    If you need to join and select from many tables, consider if all those joins and selections are truly necessary for this specific query.
  • Why it works: Retrieving less data means less I/O (disk reads), less network traffic, and less memory usage for the database server and the client application.

7. Common Cause: Table Bloat (for PostgreSQL)

  • Diagnosis: Queries on a table are consistently slow, especially updates and deletes, even with proper indexing and up-to-date statistics. EXPLAIN ANALYZE might show unexpectedly high row counts being scanned.
  • Fix: Perform VACUUM FULL (or VACUUM with ANALYZE) on the table.
    VACUUM FULL users;
    ANALYZE users;
    
  • Why it works: PostgreSQL uses Multi-Version Concurrency Control (MVCC). When rows are updated or deleted, old versions aren’t immediately removed. VACUUM reclaims this space. VACUUM FULL rewrites the entire table and indexes, which can significantly reduce its size and improve scan performance, but it locks the table. A regular VACUUM is usually sufficient for keeping statistics fresh and reclaiming space without a full table rewrite.

8. Common Cause: Insufficient Hardware or Configuration

  • Diagnosis: Even with optimized queries and indexes, the database is slow. EXPLAIN ANALYZE shows high I/O wait or CPU usage. pg_stat_activity shows many queries running.
  • Fix: This is less about a specific command and more about system tuning.
    • Increase RAM for caching.
    • Tune PostgreSQL configuration parameters like shared_buffers, work_mem, effective_cache_size. For example, setting shared_buffers = 25% of total RAM is a common starting point.
    • Upgrade to faster storage (SSDs).
    • Consider read replicas for read-heavy workloads.
  • Why it works: The database needs resources to perform its work. Insufficient RAM means more disk I/O. Inefficient configuration means the database isn’t using its available resources optimally.

After fixing a slow query, the next thing you’ll likely encounter is a slow query that depends on the first one’s results.

Want structured learning?

Take the full Performance course →