Postgres parallel queries don’t just speed up your analytics; they fundamentally change the computational model from a single-threaded bottleneck to a distributed, albeit local, processing unit.
Let’s watch it in action. Imagine a table sales with 100 million rows, and we want to sum amount for a specific region.
-- Non-parallel query
EXPLAIN ANALYZE
SELECT SUM(amount)
FROM sales
WHERE region = 'North';
This will likely take a significant amount of time, with the query plan showing a single sequential scan or index scan.
Now, for the parallel version:
-- Parallel query
EXPLAIN ANALYZE
SELECT SUM(amount)
FROM sales
WHERE region = 'North';
The key here is that Postgres, when configured correctly, will automatically decide to parallelize this operation. You’ll see "Gather" and "Parallel Seq Scan" or "Parallel Index Scan" in the EXPLAIN ANALYZE output. The "Workers Planned" and "Workers Launched" will indicate how many processes Postgres decided to use. The execution time, for a sufficiently large dataset and complex query, will be dramatically reduced.
The problem Postgres parallel query solves is the inherent limitation of a single CPU core tackling massive datasets. For analytical workloads, which often involve full table scans, aggregations, and joins on large tables, a single process becomes a significant bottleneck. Parallel query execution distributes the work of a single SQL statement across multiple worker processes, each running on a separate CPU core. These workers fetch data, perform intermediate calculations (like partial sums or filtered row counts), and then a "Gather" node at the end merges these intermediate results into the final answer.
The internal mechanism relies on the "parallel query coordinator" (your main Postgres process) breaking down the query plan into "parallelizable" operations. These operations are then handed off to "worker processes." For a SELECT SUM(...) FROM table WHERE ..., the plan might look like this: a Seq Scan (or Index Scan) on the table, with its output fed into a Partial Aggregate operation, then a Gather operation that collects the results from all workers, and finally, a Finalize Aggregate to produce the single sum. The Gather node is crucial; it’s the point where all parallel workers report back, and it’s responsible for combining their partial results.
Tuning for parallel queries involves several parameters, but the most impactful are max_parallel_workers_per_gather, max_worker_processes, and shared_buffers.
-
max_parallel_workers_per_gather: This controls how many worker processes can be started by a singleGathernode. A common starting point is2or4, but for systems with many cores, you might increase this.- Diagnosis: Check
SHOW max_parallel_workers_per_gather;and observeWorkers LaunchedinEXPLAIN ANALYZE. - Fix:
ALTER SYSTEM SET max_parallel_workers_per_gather = 4;thenSELECT pg_reload_conf(); - Why it works: This directly dictates the maximum degree of parallelism for any single query operation that uses a
Gathernode.
- Diagnosis: Check
-
max_worker_processes: This is the total number of background processes Postgres can run, including parallel workers, autovacuum workers, and replication workers. You need enough to cover all your parallel queries plus other background tasks.- Diagnosis: Check
SHOW max_worker_processes;andSELECT count(*) FROM pg_stat_activity WHERE backend_type = 'parallel worker';(though the latter is dynamic). Ensuremax_worker_processesis greater than the sum of all potential parallel workers and other background processes. - Fix:
ALTER SYSTEM SET max_worker_processes = 100;(adjust based on your core count and other background needs, e.g.,2 * num_cores + 10is a rough guide). ThenSELECT pg_reload_conf(); - Why it works: This sets the absolute upper limit for the total number of worker processes Postgres can spawn, ensuring you don’t run out of system resources for background operations.
- Diagnosis: Check
-
shared_buffers: While not directly a parallel query parameter, adequateshared_buffersare essential. Parallel workers need to read data, and if that data is already in shared memory, it’s much faster.- Diagnosis: Check
SHOW shared_buffers;. For analytical workloads on large tables, this should be a significant portion of your RAM (e.g., 25% to 40%). - Fix:
ALTER SYSTEM SET shared_buffers = '16GB';(adjust to your RAM). ThenSELECT pg_reload_conf(); - Why it works: A larger
shared_bufferscache means more data blocks are readily available in memory, reducing the need for slow disk I/O for parallel workers.
- Diagnosis: Check
-
work_mem: This parameter affects sorting and hashing operations within each worker. If a sort or hash spills to disk within a worker, performance plummets.- Diagnosis: Monitor disk I/O for your Postgres server during heavy analytical queries. Look for "Sort Method: external merge Disk" in
EXPLAIN ANALYZE. - Fix:
ALTER SYSTEM SET work_mem = '128MB';(increase gradually, but be mindful thatwork_memis per operation per worker, so total consumption can be high). ThenSELECT pg_reload_conf(); - Why it works: A larger
work_memallows more intermediate sorting and hashing to be performed in RAM within each worker process, avoiding costly disk spills.
- Diagnosis: Monitor disk I/O for your Postgres server during heavy analytical queries. Look for "Sort Method: external merge Disk" in
-
min_parallel_table_scan_sizeandmin_parallel_index_scan_size: These parameters tell Postgres the minimum size (in disk pages) a table or index scan must be before it’s considered for parallelization.- Diagnosis: Check
SHOW min_parallel_table_scan_size;andSHOW min_parallel_index_scan_size;. If they are too high, small tables or targeted index scans might not be parallelized. - Fix:
ALTER SYSTEM SET min_parallel_table_scan_size = '1MB';andALTER SYSTEM SET min_parallel_index_scan_size = '1MB';(adjust to your needs, smaller values encourage more parallelism but can lead to overhead on very small operations). ThenSELECT pg_reload_conf(); - Why it works: These settings provide a threshold to prevent parallel overhead from outweighing the benefits on smaller scans. Adjusting them can enable parallelism on a wider range of query patterns.
- Diagnosis: Check
One common pitfall is that not all query operations are parallelizable. For instance, certain types of subqueries, CTEs, or specific functions might force a part of the query to run serially, even if other parts are parallel. The planner makes decisions based on the entire query plan, and if any component can’t be parallelized, the entire operation might revert to serial execution or only partially parallelize.
The next hurdle you’ll encounter is understanding how to debug and tune distributed Postgres systems, like those using Citus, where parallel query concepts extend across multiple nodes.