Database queries that take too long to complete are usually a symptom of one of two things: the database can’t find the data it needs quickly enough, or it can find the data, but the work it has to do with that data is excessive.
Let’s see this in action. Imagine we have a users table and we want to find all users who signed up in the last 7 days and live in California.
SELECT *
FROM users
WHERE signup_date >= NOW() - INTERVAL '7 days'
AND state = 'CA';
If this query is slow, we first need to figure out why. The most common culprit is a missing index. Without an index on signup_date and state, the database has to scan every single row in the users table to check if it matches the WHERE clause conditions. This is called a full table scan, and it’s terrible for performance on large tables.
Here’s how we find out if a full table scan is happening and how to fix it:
1. Diagnosis: Check the Query Plan
Every database system has a way to show you how it plans to execute a query. In PostgreSQL, it’s EXPLAIN ANALYZE.
EXPLAIN ANALYZE SELECT *
FROM users
WHERE signup_date >= NOW() - INTERVAL '7 days'
AND state = 'CA';
Look for Seq Scan (Sequential Scan) in the output. If you see Seq Scan on users, that’s your full table scan. It will also show you the total execution time.
2. Common Cause: Missing Indexes
- Diagnosis: As above,
Seq Scan on usersinEXPLAIN ANALYZE. - Fix: Create indexes on the columns used in your
WHEREclause. For this query, a composite index is often best.CREATE INDEX idx_users_signup_date_state ON users (signup_date, state); - Why it works: Indexes are like the index at the back of a book. Instead of reading every page (row), the database can quickly jump to the relevant sections (rows matching
signup_dateandstate) using the sorted index structure.
3. Common Cause: Inefficient Indexes (Wrong Order)
- Diagnosis: You have indexes, but
EXPLAIN ANALYZEstill shows aSeq Scanor a plan that doesn’t seem to use your indexes effectively. Sometimes, the order of columns in a composite index matters significantly. If your index is(state, signup_date)and your query heavily filters onsignup_datefirst (though in this specific example it’s treated equally), the database might not use it optimally. - Fix: Reorder columns in the composite index to match the query’s filter selectivity or typical usage. For this query,
(signup_date, state)is generally good becausesignup_dateoften has a higher selectivity (fewer rows qualify) when looking at recent dates thanstatealone on a large, diverse user base.DROP INDEX idx_users_signup_date_state; CREATE INDEX idx_users_signup_date_state ON users (signup_date, state); - Why it works: The database can more efficiently seek through the index if the leading columns are the most selective filters for the query.
4. Common Cause: Outdated Statistics
- Diagnosis:
EXPLAIN ANALYZEshows the query is using an index, but therows removed by filteris very high, or theactual timeis still excessive. The query planner might be making bad decisions because it doesn’t know the current distribution of data. - Fix: Update database statistics. In PostgreSQL:
ANALYZE users; - Why it works: The query planner uses statistics about the data’s distribution (e.g., how many users are in California, how many signed up today) to choose the most efficient execution plan. Outdated statistics lead to poor plan choices.
5. Common Cause: Poorly Written Queries (Function Calls in WHERE)
- Diagnosis:
EXPLAIN ANALYZEshows aSeq Scaneven with indexes present. If you had a query like:
TheSELECT * FROM users WHERE DATE(signup_date) >= NOW() - INTERVAL '7 days' AND state = 'CA';DATE(signup_date)function prevents the database from using a standard index onsignup_datebecause it has to compute the date for every row before it can compare. - Fix: Rewrite the query to avoid applying functions to indexed columns in the
WHEREclause.
Or, more precisely, ensure the comparison is on the raw column:SELECT * FROM users WHERE signup_date >= DATE(NOW() - INTERVAL '7 days') AND state = 'CA';SELECT * FROM users WHERE signup_date >= (CURRENT_DATE - INTERVAL '7 days') AND state = 'CA'; - Why it works: By comparing the indexed column directly to a calculated value, the database can use the index for a "range scan" or "index seek."
6. Common Cause: Large SELECT * or Unnecessary Data Retrieval
- Diagnosis:
EXPLAIN ANALYZEshows good index usage, but theexecution timeis still high, and theBuffers: shared hitvs.readshows a lot of disk reads, or the overallrowsreturned is huge. - Fix: Select only the columns you need, not
*.
If you need to join and select from many tables, consider if all those joins and selections are truly necessary for this specific query.SELECT user_id, username, email FROM users WHERE signup_date >= NOW() - INTERVAL '7 days' AND state = 'CA'; - Why it works: Retrieving less data means less I/O (disk reads), less network traffic, and less memory usage for the database server and the client application.
7. Common Cause: Table Bloat (for PostgreSQL)
- Diagnosis: Queries on a table are consistently slow, especially updates and deletes, even with proper indexing and up-to-date statistics.
EXPLAIN ANALYZEmight show unexpectedly high row counts being scanned. - Fix: Perform
VACUUM FULL(orVACUUMwithANALYZE) on the table.VACUUM FULL users; ANALYZE users; - Why it works: PostgreSQL uses Multi-Version Concurrency Control (MVCC). When rows are updated or deleted, old versions aren’t immediately removed.
VACUUMreclaims this space.VACUUM FULLrewrites the entire table and indexes, which can significantly reduce its size and improve scan performance, but it locks the table. A regularVACUUMis usually sufficient for keeping statistics fresh and reclaiming space without a full table rewrite.
8. Common Cause: Insufficient Hardware or Configuration
- Diagnosis: Even with optimized queries and indexes, the database is slow.
EXPLAIN ANALYZEshows highI/O waitor CPU usage.pg_stat_activityshows many queries running. - Fix: This is less about a specific command and more about system tuning.
- Increase RAM for caching.
- Tune PostgreSQL configuration parameters like
shared_buffers,work_mem,effective_cache_size. For example, settingshared_buffers = 25% of total RAMis a common starting point. - Upgrade to faster storage (SSDs).
- Consider read replicas for read-heavy workloads.
- Why it works: The database needs resources to perform its work. Insufficient RAM means more disk I/O. Inefficient configuration means the database isn’t using its available resources optimally.
After fixing a slow query, the next thing you’ll likely encounter is a slow query that depends on the first one’s results.