Window functions are Postgres’s way of letting you run aggregate-like calculations across a set of rows that are related to the current row, without collapsing those rows down into a single output row.
Here’s a table of recent sales:
CREATE TABLE sales (
sale_id SERIAL PRIMARY KEY,
product_id INT,
sale_date DATE,
amount DECIMAL(10, 2)
);
INSERT INTO sales (product_id, sale_date, amount) VALUES
(101, '2023-01-15', 50.00),
(102, '2023-01-15', 75.00),
(101, '2023-01-20', 60.00),
(103, '2023-01-20', 100.00),
(101, '2023-01-25', 55.00),
(102, '2023-01-25', 80.00),
(101, '2023-02-01', 65.00),
(103, '2023-02-01', 110.00);
Let’s say you want to assign a unique, sequential number to each sale, ordered by date, but you want to do this independently for each product_id. That’s where ROW_NUMBER() comes in, combined with PARTITION BY.
SELECT
sale_id,
product_id,
sale_date,
amount,
ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY sale_date) as sale_sequence
FROM sales;
This query will produce:
| sale_id | product_id | sale_date | amount | sale_sequence |
|---|---|---|---|---|
| 1 | 101 | 2023-01-15 | 50.00 | 1 |
| 3 | 101 | 2023-01-20 | 60.00 | 2 |
| 5 | 101 | 2023-01-25 | 55.00 | 3 |
| 7 | 101 | 2023-02-01 | 65.00 | 4 |
| 2 | 102 | 2023-01-15 | 75.00 | 1 |
| 6 | 102 | 2023-01-25 | 80.00 | 2 |
| 4 | 103 | 2023-01-20 | 100.00 | 1 |
| 8 | 103 | 2023-02-01 | 110.00 | 2 |
The OVER clause is the magic. PARTITION BY product_id tells Postgres to reset the ROW_NUMBER() for every distinct product_id. ORDER BY sale_date within that partition dictates the order in which the numbers are assigned. So, for product_id 101, the first sale by date gets 1, the second gets 2, and so on. The same logic applies independently to products 102 and 103.
Now, imagine you want to see the previous sale amount for each product. That’s where LAG() comes in. It looks at the row before the current one within its partition.
SELECT
sale_id,
product_id,
sale_date,
amount,
LAG(amount, 1, 0.00) OVER (PARTITION BY product_id ORDER BY sale_date) as previous_sale_amount
FROM sales;
This query will produce:
| sale_id | product_id | sale_date | amount | previous_sale_amount |
|---|---|---|---|---|
| 1 | 101 | 2023-01-15 | 50.00 | 0.00 |
| 3 | 101 | 2023-01-20 | 60.00 | 50.00 |
| 5 | 101 | 2023-01-25 | 55.00 | 60.00 |
| 7 | 101 | 2023-02-01 | 65.00 | 55.00 |
| 2 | 102 | 2023-01-15 | 75.00 | 0.00 |
| 6 | 102 | 2023-01-25 | 80.00 | 75.00 |
| 4 | 103 | 2023-01-20 | 100.00 | 0.00 |
| 8 | 103 | 2023-02-01 | 110.00 | 100.00 |
The LAG(amount, 1, 0.00) part means: "take the amount from the row that is 1 row before the current one in the partition. If there is no previous row (like for the first sale of a product), use 0.00 as the default value." The PARTITION BY product_id and ORDER BY sale_date are crucial here too, defining what "previous" means.
The mental model is that each row gets to "look around" at other rows within its defined "window" (the partition and ordering). ROW_NUMBER() assigns a sequence number based on that order. LAG() fetches a value from an earlier row in that sequence. You can also use LEAD() to fetch from a later row.
The truly mind-bending part is that window functions don’t collapse rows. You get all your original rows back, augmented with the results of the window function calculation. This means you can perform complex analysis that involves comparisons to other rows without needing self-joins or subqueries that are often more complex and less performant. The ORDER BY clause within the OVER clause is not necessarily the same as the ORDER BY clause at the end of the query; the former defines the order for the window function’s calculation, while the latter defines the order of the final output.
The next step is to see how you can perform running totals or averages using window functions.