Common Table Expressions (CTEs) in PostgreSQL can either be executed inline every time they’re referenced, or they can be "materialized" – essentially, run once and their results stored in a temporary table for faster access on subsequent references.
Let’s see this in action. Imagine we have a users table and we want to find users who joined in the last month and then count how many of them are active.
WITH recent_users AS (
SELECT user_id, username
FROM users
WHERE signup_date >= NOW() - INTERVAL '1 month'
)
SELECT COUNT(u.user_id)
FROM recent_users u
JOIN user_activity ua ON u.user_id = ua.user_id
WHERE ua.last_login >= NOW() - INTERVAL '1 day';
By default, PostgreSQL treats recent_users as an inline CTE. This means the subquery SELECT user_id, username FROM users WHERE signup_date >= NOW() - INTERVAL '1 month' will be executed every single time the recent_users CTE is referenced in the main query. If recent_users is referenced multiple times, it will run multiple times.
To tell PostgreSQL to materialize this CTE, we use the MATERIALIZED keyword.
WITH MATERIALIZED recent_users AS (
SELECT user_id, username
FROM users
WHERE signup_date >= NOW() - INTERVAL '1 month'
)
SELECT COUNT(u.user_id)
FROM recent_users u
JOIN user_activity ua ON u.user_id = ua.user_id
WHERE ua.last_login >= NOW() - INTERVAL '1 day';
Now, the recent_users subquery runs once. Its results are stored in a temporary, internal table. Subsequent references to recent_users (if there were any in this simple example, or more complex ones) will read from this temporary table, which is often significantly faster than re-executing the original subquery.
The core problem that CTEs, and specifically materialized CTEs, solve is computational redundancy. Without materialization, if a CTE is referenced multiple times within a single query, its underlying logic is re-executed each time. This is particularly painful for complex subqueries that involve significant joins or aggregations. Materialization ensures that expensive computations are performed only once.
Internally, when you use MATERIALIZED, PostgreSQL creates a temporary table (often using TEMPORARY TABLE or a similar mechanism) to store the intermediate results. It then rewrites the query plan to read from this temporary table instead of re-executing the CTE’s definition. The planner has to make a decision: is it cheaper to materialize the CTE and then read from it, or to re-execute the CTE multiple times? For a CTE that’s referenced multiple times or is expensive to compute, materialization usually wins.
The primary lever you control is the MATERIALIZED keyword. You decide, based on your understanding of the query and your data, when to hint to the planner that materialization is likely beneficial. PostgreSQL’s query planner is smart, but it doesn’t always know your data distribution or the actual cost of a subquery better than you do.
The actual performance difference hinges on the cost of the CTE’s underlying query versus the cost of reading from a temporary table, and how many times the CTE is referenced. If a CTE is only referenced once, MATERIALIZED can actually hurt performance because you incur the cost of creating and populating the temporary table, plus the cost of reading from it, when a single execution of the original subquery would have been sufficient. It’s a trade-off: one-time cost of materialization vs. repeated execution cost.
When you use MATERIALIZED, PostgreSQL might still decide not to materialize it if it determines that re-executing the CTE is cheaper. The MATERIALIZED keyword is a strong hint, not an absolute command. The planner always has the final say based on its cost estimations.
The next problem you’ll run into is understanding when MATERIALIZED is actually being used. PostgreSQL doesn’t always give you a clear "yes, I materialized this" signal in its default EXPLAIN output without specific flags.