pg_stat_statements doesn’t actually find slow queries; it aggregates query performance metrics, and you have to interpret that data to identify your slowest ones.
Let’s see it in action. First, you need to enable the extension and tell Postgres to track statements.
-- Enable the extension (run as superuser)
CREATE EXTENSION pg_stat_statements;
-- Configure tracking in postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000 -- How many unique statements to track
pg_stat_statements.track = all -- Track all queries, or top, or none
pg_stat_statements.track_utility = off -- Usually don't need to track VACUUM, etc.
After restarting Postgres, the pg_stat_statements view will start populating. Here’s how you’d query it to find your most time-consuming queries:
SELECT
calls,
total_exec_time,
rows,
mean_exec_time,
stddev_exec_time,
query
FROM
pg_stat_statements
ORDER BY
total_exec_time DESC
LIMIT 10;
This query gives you a snapshot of the top 10 queries by total execution time. total_exec_time is the sum of execution times for all calls to a given query. calls is the number of times the query has been executed. mean_exec_time is total_exec_time / calls, giving you the average time per execution. rows is the total number of rows returned by the query across all calls, and stddev_exec_time shows the variability in execution time.
The real power of pg_stat_statements is its ability to aggregate identical queries, even if they have different literal values. It normalizes queries by replacing literals with placeholders (like $1, $2). This means SELECT * FROM users WHERE id = 1 and SELECT * FROM users WHERE id = 100 are treated as the same statement for tracking purposes. This is crucial because you don’t want to see the same query pattern repeated thousands of times just because the WHERE clause changed slightly.
The most common use case is finding queries that consume the most total time, as shown above. However, a query might have a high total_exec_time simply because it’s called very frequently, not because it’s inherently slow per execution. Consider a query that takes 1ms but runs a million times – its total_exec_time will be high, but it might not be a bottleneck.
To find truly slow queries, you need to look at mean_exec_time and also consider the calls count. A query with a high mean_exec_time that’s also called a reasonable number of times is a strong candidate for optimization.
Here’s a query to find queries with high average execution time, but we’ll filter out queries that have been called only a few times, as their average might be skewed by a single slow run:
SELECT
calls,
total_exec_time,
rows,
mean_exec_time,
stddev_exec_time,
query
FROM
pg_stat_statements
WHERE
calls > 1000 -- Only consider queries run more than 1000 times
ORDER BY
mean_exec_time DESC
LIMIT 10;
This helps you differentiate between a query that’s consistently slow and one that had an outlier execution. The stddev_exec_time is also valuable here; a high standard deviation suggests that query performance is inconsistent, which can be as problematic as consistently slow performance.
You can also identify queries that are returning a disproportionately large number of rows compared to their execution time. A query that returns millions of rows in a few milliseconds might be highly efficient, but one that returns a few rows and takes seconds warrants investigation.
SELECT
calls,
total_exec_time,
rows,
mean_exec_time,
stddev_exec_time,
query
FROM
pg_stat_statements
WHERE
calls > 100 -- Filter out very infrequent queries
ORDER BY
rows / calls DESC NULLS LAST -- Rows per call
LIMIT 10;
The pg_stat_statements extension is a powerful tool for performance monitoring, but it’s important to remember that it’s a sampling mechanism and a summary. It doesn’t log individual query executions or their full plans. Its metrics are aggregated over time. For more in-depth analysis of specific query plans and execution details, you’d typically look to EXPLAIN ANALYZE.
The pg_stat_statements.track_utility setting, when set to on, will track utility commands like VACUUM, ANALYZE, CREATE INDEX, etc., in addition to DML and DDL. This can be useful if you suspect these background operations are impacting performance, but it can also inflate the pg_stat_statements view with many less interesting entries if you’re primarily focused on application queries.
When you reset pg_stat_statements using SELECT pg_stat_statements_reset(), all collected statistics are cleared. This is useful if you’ve made configuration changes or deployed new code and want to measure performance from a clean slate, rather than having old data skew your results.
The actual query text stored in pg_stat_statements.query is the normalized version, meaning literals are replaced with parameter placeholders. This is what allows aggregation. If you need to see the original query with its literals, you’d need to capture that information elsewhere, perhaps using logging or other monitoring tools.
The next step after identifying slow queries is to analyze their execution plans using EXPLAIN ANALYZE and then optimize them, often through indexing, query rewriting, or schema adjustments.