Postgres query tuning isn’t just about finding slow queries; it’s about understanding how the database chooses to execute them, and then nudging it towards a better path.

Let’s watch a query in action. Imagine this simple SELECT statement against a table named users with a million rows, containing id (BIGINT, primary key), username (VARCHAR), and created_at (TIMESTAMP):

SELECT username FROM users WHERE created_at BETWEEN '2023-01-01' AND '2023-01-31';

When Postgres receives this, it doesn’t just scan the whole table. It consults its query planner. The planner considers various execution plans: a full table scan, using an index on created_at if one exists, etc. It estimates the cost of each plan based on statistics about the data (how many rows, data distribution) and chooses the one it thinks is fastest.

Here’s how you can see the planner’s decision:

EXPLAIN ANALYZE SELECT username FROM users WHERE created_at BETWEEN '2023-01-01' AND '2023-01-31';

The output will look something like this (simplified):

                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Index Scan using users_created_at_idx on users  (cost=0.42..25.33 rows=1000 width=10) (actual time=0.050..10.123 rows=500000 loops=1)
   Index Cond: (created_at >= '2023-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2023-01-31 00:00:00'::timestamp without time zone)
 Planning Time: 0.100 ms
 Execution Time: 12.500 ms

This output tells us Postgres used an Index Scan on a (hypothetical) index named users_created_at_idx. It found about 500,000 rows matching the created_at condition. The actual time for the scan was 10.123ms. This is the mental model: the planner, statistics, execution plans, and actual execution.

The core problem Postgres solves with query tuning is efficiency. Without it, simple queries could take minutes, or even hours, to return data, grinding applications to a halt. The main levers you control are schema design (indexes, data types), query writing (avoiding anti-patterns), and server configuration (memory, parallelism).

The problem is that the planner is always an estimate. It relies on statistics. If those statistics are stale, the planner can make terrible decisions. For example, if your created_at column has grown significantly and the planner thinks only 100 rows match your range when it’s actually 500,000, it might choose a full table scan instead of an index scan because it believes the index is more expensive than it truly is for that specific query.

To update statistics, you run:

ANALYZE users;

This command collects information about the contents of tables and indexes, which the query planner uses to make informed decisions. Running ANALYZE regularly, especially after significant data changes, is crucial. A common mistake is to only ANALYZE the entire database (VACUUM ANALYZE) and forget that ANALYZE on individual tables is often sufficient and faster.

The most powerful tool for understanding query performance is EXPLAIN ANALYZE. It not only shows the planned execution but also the actual execution time and row counts. This reveals where the time is actually spent.

EXPLAIN ANALYZE SELECT username FROM users WHERE created_at >= '2023-01-01' ORDER BY username;

If this query is slow, and EXPLAIN ANALYZE shows a high actual time for a "Sort" operation, it means Postgres had to sort a large number of rows. This often happens when there isn’t a suitable index to provide the data in the desired order.

The fix here is to create a composite index that matches both the WHERE clause and the ORDER BY clause:

CREATE INDEX users_created_at_username_idx ON users (created_at, username);

This index allows Postgres to find the rows and return them already sorted by username, eliminating the expensive sort step. The index is scanned, and the data is read in the order specified by the index itself.

Another common performance killer is the LIKE operator with a leading wildcard.

SELECT * FROM products WHERE name LIKE '%widget%';

If you run EXPLAIN ANALYZE on this, you’ll likely see a "Seq Scan" (Sequential Scan, meaning a full table scan) because standard B-tree indexes can’t efficiently search for patterns that start with an arbitrary character.

The solution is often to use PostgreSQL’s full-text search capabilities or, if you need exact substring matching, consider trigram indexes.

CREATE EXTENSION pg_trgm;
CREATE INDEX products_name_trgm_idx ON products USING gin (name gin_trgm_ops);

Then, the query with LIKE can use this index:

EXPLAIN ANALYZE SELECT * FROM products WHERE name LIKE '%widget%';

This gin_trgm_ops index preprocesses the name column into trigrams (sequences of three characters) and allows for efficient searching of substrings.

When dealing with JSONB data, indexing is crucial. A common pattern is to query fields within a JSONB document.

SELECT data ->> 'status' FROM events WHERE data @> '{"type": "user_action"}';

Without an index, this query would scan the entire events table. For efficient querying of JSONB content, you can use GIN indexes.

CREATE INDEX events_data_gin_idx ON events USING gin (data);

This index allows Postgres to quickly locate JSONB documents containing specific keys or values, significantly speeding up queries that use operators like @>, ?, ?|, ?&. The GIN index works by indexing the keys and values within the JSONB document itself, enabling targeted lookups.

It’s also vital to understand how VACUUM and VACUUM ANALYZE work. VACUUM reclaims storage occupied by dead tuples (rows that have been deleted or updated). Without regular VACUUMing, your tables can bloat, leading to slower scans and increased disk I/O. VACUUM ANALYZE does this reclamation and updates table statistics.

VACUUM (FULL, ANALYZE) is a more aggressive form that rewrites the entire table, reclaiming more space but also locking the table for the duration. Use this sparingly. Automatic VACUUM is usually sufficient.

The next hurdle you’ll likely encounter after optimizing your queries is managing connection pooling.

Want structured learning?

Take the full Postgres course →