Postgres’s work_mem setting is the single most confusingly named, yet critically important, knob for query performance.
Let’s see work_mem in action. Imagine we have a large table, orders, with millions of rows, and we want to find the top 10 customers by total order value.
-- Create a sample table
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INT,
order_value DECIMAL(10, 2)
);
-- Populate with some data (in a real scenario, this would be millions of rows)
INSERT INTO orders (customer_id, order_value) VALUES
(1, 100.50), (2, 25.75), (1, 50.00), (3, 120.00), (2, 75.25),
(1, 200.00), (4, 30.00), (3, 60.50), (1, 150.25), (2, 90.00);
-- A query that will likely use work_mem
SELECT customer_id, SUM(order_value) AS total_spent
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 10;
Now, let’s examine how Postgres might execute this query with different work_mem settings.
The Problem work_mem Solves
When Postgres needs to perform operations like sorting (for ORDER BY or GROUP BY) or hashing (for hash joins or hash aggregation), it needs temporary memory to do it. If the data required for these operations fits entirely within the work_mem allocated per operation, Postgres can perform them efficiently in memory. If it doesn’t fit, Postgres has to spill to disk, which is orders of magnitude slower.
How it Works Internally
work_mem is not a global memory pool for the entire database. It’s a per-operation memory limit. This means that if a single query has multiple sort or hash operations, each of those operations can consume up to work_mem.
Consider our GROUP BY and ORDER BY query. Postgres will first group the orders by customer_id. This grouping operation might use work_mem for hash aggregation. If the intermediate results of the grouping don’t fit in work_mem, Postgres will write temporary files to disk. After grouping, it needs to sort the aggregated results to apply the ORDER BY total_spent DESC. This sorting operation also has its own work_mem budget.
The key takeaway is that work_mem dictates how much memory is available for each individual sort or hash step within a query. It’s not the total memory for the query, nor is it the total memory for the connection.
Tuning work_mem
The default work_mem is usually very small (e.g., 4MB). This is a safe default to prevent a single runaway query from consuming all system memory, but it’s often too low for efficient execution of analytical queries.
Diagnosis:
The first step is to understand if your query is spilling to disk. Use EXPLAIN ANALYZE on your query:
EXPLAIN ANALYZE
SELECT customer_id, SUM(order_value) AS total_spent
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 10;
Look for lines in the output like:
Sort Method: external merge Disk: 1234kBHashAggregate (cost=... rows=... width=...) (actual time=... rows=... loops=...) "HashAggregate (actual time=123.456..567.890 rows=10 loops=1)"Group Key: customer_id-> Sort (cost=... rows=... width=...) (actual time=... rows=... loops=...)"Sort Key: customer_id-> Seq Scan on orders (cost=... rows=... width=...) (actual time=... rows=... loops=...)"Sort Method: external merge Disk: 5678kB
The Disk: XXXkB or external merge indicates that the operation spilled to disk.
Tuning:
-
Identify the culprit operation: In the
EXPLAIN ANALYZEoutput, find theSortorHashAggregate(orHash Join) operations that show disk usage. -
Estimate required memory: The
EXPLAIN ANALYZEoutput often gives you a hint about the amount of data processed. For sorting, it’s roughly the size of the data being sorted. For hashing, it’s related to the number of unique keys and the size of the rows. -
Increase
work_mem: You can setwork_memat different levels:- Globally (for all users/sessions):
Edit
postgresql.confand setwork_mem = 64MB. Then reload the configuration:pg_ctl reloadorSELECT pg_reload_conf();. - Per Session:
SET work_mem = '64MB';(This applies only to the current session). - Per Transaction:
SET LOCAL work_mem = '64MB';(This applies only to the current transaction).
Let’s try setting it to
64MBand re-runningEXPLAIN ANALYZE:SET work_mem = '64MB'; -- Or set it globally in postgresql.conf EXPLAIN ANALYZE SELECT customer_id, SUM(order_value) AS total_spent FROM orders GROUP BY customer_id ORDER BY total_spent DESC LIMIT 10;If the
Disk: XXXkBlines disappear, you’ve successfully tuned it. If it still spills, you might need to increase it further, perhaps to128MBor256MB, depending on your data and available RAM. - Globally (for all users/sessions):
Edit
Why it Works:
By increasing work_mem, you provide more memory for the sort or hash operation. If the data fits within this larger budget, Postgres can keep the entire operation in RAM, avoiding slow disk I/O and significantly speeding up query execution.
Important Considerations:
work_memis per operation, per connection: If you have a complex query with two sorts and one hash, andwork_memis64MB, each of those three operations can use up to64MB. If you have 10 concurrent connections, each running a query that uses two such operations, you could theoretically need10 connections * 2 operations/connection * 64MB/operation = 1.28GBof RAM just forwork_memusage. Be mindful of your total system RAM.- Too high
work_memis bad: Settingwork_memtoo high can lead to Out-Of-Memory (OOM) killer situations on your server if many connections simultaneously try to use their largework_memallocations. - Index Usage: Sometimes, the best way to avoid large sorts or hashes is to create appropriate indexes. For our example, an index on
(customer_id, order_value)might help theGROUP BYoperation.
The next common performance bottleneck after tuning work_mem is often related to buffer cache hit ratios or inefficient query plans that don’t leverage indexes effectively.