Postgres’ recursive CTEs can query hierarchical data in ways that feel more like a simple SELECT than a complex graph traversal.
Let’s say you have a table of employees, where each employee has a manager_id that points to another employee’s id.
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
manager_id INT REFERENCES employees(id)
);
INSERT INTO employees (name, manager_id) VALUES
('Alice', NULL), -- CEO
('Bob', 1), -- Alice's direct report
('Charlie', 1), -- Alice's direct report
('David', 2), -- Bob's direct report
('Eve', 2), -- Bob's direct report
('Frank', 3); -- Charlie's direct report
-- Current state of the employees table
SELECT * FROM employees;
This sets up a simple organizational hierarchy. We want to be able to ask questions like "Who reports to Bob, directly or indirectly?" or "Show me Alice’s entire team."
A recursive Common Table Expression (CTE) is the tool for this. It has two parts: a non-recursive "anchor" member and a recursive "union" member. The anchor starts the process, and the recursive member repeatedly applies itself to the results of the previous iteration until no new rows are generated.
Here’s how you’d find all subordinates of 'Bob' (employee ID 2):
WITH RECURSIVE subordinates AS (
-- Anchor member: Select Bob directly
SELECT id, name, manager_id, 0 AS level
FROM employees
WHERE id = 2
UNION ALL
-- Recursive member: Find employees whose manager is in the previous result set
SELECT e.id, e.name, e.manager_id, s.level + 1
FROM employees e
JOIN subordinates s ON e.manager_id = s.id
)
SELECT name, level
FROM subordinates
ORDER BY level, name;
Let’s break down that CTE:
WITH RECURSIVE subordinates AS (...): This declares a recursive CTE namedsubordinates.SELECT id, name, manager_id, 0 AS level FROM employees WHERE id = 2: This is the anchor member. It selects the starting point – employee ID 2, 'Bob' – and assigns them alevelof 0.UNION ALL: This combines the results of the anchor member with the results of the recursive member.UNION ALLis crucial here;UNIONwould remove duplicates, which you generally don’t want in recursive queries unless you’re very careful.SELECT e.id, e.name, e.manager_id, s.level + 1 FROM employees e JOIN subordinates s ON e.manager_id = s.id: This is the recursive member. It joins theemployeestable (e) with thesubordinatesCTE (s). The join conditione.manager_id = s.idmeans "find employees whose manager is someone who was found in the previous step." We also increment thelevelby 1 for each step down the hierarchy.
The CTE will run like this:
- Iteration 1 (Anchor):
subordinatescontains(2, 'Bob', 1, 0). - Iteration 2 (Recursive): Joins
employeeswith(2, 'Bob', 1, 0). Finds employees wheremanager_idis 2. This is 'David' (id 4) and 'Eve' (id 5).subordinatesnow contains(2, 'Bob', 1, 0), (4, 'David', 2, 1), (5, 'Eve', 2, 1). - Iteration 3 (Recursive): Joins
employeeswith the results from Iteration 2. Finds employees whosemanager_idis 4 or 5. No such employees exist. - The recursion stops. The final result is the
SELECT name, level FROM subordinates ORDER BY level, name;query applied to the accumulated rows.
You can also use this to build a full hierarchy tree, showing the path from the root:
WITH RECURSIVE org_tree AS (
-- Anchor: Start with the CEO (no manager)
SELECT id, name, manager_id, ARRAY[id] AS path, 0 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: Find direct reports and extend the path
SELECT e.id, e.name, e.manager_id, ot.path || e.id, ot.level + 1
FROM employees e
JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT name, level, path
FROM org_tree
ORDER BY level, name;
Here, ARRAY[id] AS path initializes the path with the current employee’s ID. In the recursive step, ot.path || e.id appends the current employee’s ID to the path of their manager, effectively tracing the lineage.
The most surprising thing about recursive CTEs is how they abstract away the iterative process, allowing you to think of the entire result set as a single entity being built. It’s not about how the rows are generated step-by-step, but about the definition of the final set.
One critical detail often overlooked is how Postgres handles the UNION ALL and the recursive step. It builds the intermediate result set for each recursion level, and then the next level operates on that entire intermediate result. If your hierarchy is very deep and wide, this can consume significant memory. Postgres doesn’t stop you from writing queries that will exhaust your system’s resources; it’s up to you to manage the query’s complexity and the data it operates on.
The next frontier in querying hierarchical or graph-like data in Postgres often involves exploring ltree or hstore extensions for more specialized and potentially performant solutions, especially for very large and complex structures.