Postgres table bloat isn’t just about wasted disk space; it’s a performance killer that slows down everything from simple SELECT queries to complex UPDATE transactions.

Let’s see it in action. Imagine a table that’s been heavily updated or deleted from. Without proper cleanup, old row versions stick around, making the table larger than it needs to be and forcing the database to scan more data than necessary.

Here’s how you can spot a bloated table. First, you need to get an estimate of the table’s actual size versus its "live" data size.

SELECT
    relname,
    pg_size_pretty(pg_table_size(oid)) AS table_size,
    pg_size_pretty(pg_total_relation_size(oid)) AS total_size,
    pg_size_pretty(pg_table_size(oid) - pg_relation_size(oid)) AS toast_size,
    pg_stat_user_tables.n_live_tup,
    pg_stat_user_tables.n_dead_tup
FROM
    pg_stat_user_tables
JOIN
    pg_class ON pg_class.oid = pg_stat_user_tables.relid
WHERE
    relname = 'your_table_name';

In this output, total_size is the total disk space used by the table and its indexes. table_size is the size of the table data itself (excluding TOAST data). n_live_tup is the number of visible rows, and n_dead_tup is the count of obsolete row versions. If n_dead_tup is significantly larger than n_live_tup, or if total_size is much larger than what you’d expect for the n_live_tup, you likely have bloat. A common rule of thumb is that if dead tuples make up more than 20% of the table’s size, it’s time to consider an action.

The primary culprit behind bloat is Postgres’s Multi-Version Concurrency Control (MVCC). When you UPDATE or DELETE rows, Postgres doesn’t actually remove the old data immediately. Instead, it marks the old row version as "dead" and inserts a new version. This allows other transactions to continue seeing the old data without being blocked, but it leaves behind these dead row versions. The PostgreSQL autovacuum daemon is supposed to clean these up, but it might not be aggressive enough, or it might be disabled.

Let’s dive into the common causes and their fixes.

Cause 1: Autovacuum is Disabled or Misconfigured

The autovacuum daemon is Postgres’s automatic cleanup utility. If it’s turned off or its thresholds are set too high, it won’t run often enough to keep bloat under control.

  • Diagnosis: Check your postgresql.conf or postgresql.auto.conf for autovacuum = off. Also, check autovacuum_vacuum_threshold and autovacuum_analyze_threshold.
  • Fix: Ensure autovacuum = on. For general tables, sensible defaults are often sufficient, but for heavily modified tables, you might lower autovacuum_vacuum_threshold (e.g., to 5000) and autovacuum_analyze_threshold (e.g., to 5000). You can also tune autovacuum_vacuum_scale_factor (e.g., to 0.1 for 10%) and autovacuum_analyze_scale_factor (e.g., to 0.05 for 5%). These settings control when autovacuum triggers: autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor * reltuples) dead tuples, and similarly for analyze.
    -- In postgresql.conf or by ALTER SYSTEM
    autovacuum = on
    autovacuum_vacuum_threshold = 5000
    autovacuum_vacuum_scale_factor = 0.1
    autovacuum_analyze_threshold = 5000
    autovacuum_analyze_scale_factor = 0.05
    
    Applying these changes requires a reload of the PostgreSQL configuration: SELECT pg_reload_conf(); or pg_ctl reload.
  • Why it works: By enabling autovacuum and tuning its thresholds, you ensure that the daemon proactively cleans up dead tuples as soon as a sufficient number accumulate, preventing them from accumulating and causing bloat.

Cause 2: Frequent UPDATEs/DELETEs on Small Tables

Tables that experience a very high rate of UPDATE or DELETE operations, even if they are small overall, can still bloat if autovacuum isn’t aggressive enough.

  • Diagnosis: Use the query from the beginning. If n_dead_tup is high relative to n_live_tup and pg_table_size, and autovacuum settings are generally reasonable, this might be the case.
  • Fix: You can manually tune autovacuum parameters per table using ALTER TABLE.
    ALTER TABLE your_table_name SET (autovacuum_vacuum_threshold = 1000, autovacuum_vacuum_scale_factor = 0.05);
    ALTER TABLE your_table_name SET (autovacuum_analyze_threshold = 1000, autovacuum_analyze_scale_factor = 0.02);
    
    Then, trigger a manual vacuum:
    VACUUM (VERBOSE, ANALYZE) your_table_name;
    
  • Why it works: Lowering the per-table thresholds means autovacuum will trigger much sooner for this specific table, even with fewer dead tuples, keeping it clean. The manual VACUUM immediately reclaims space.

Cause 3: Large TOASTed Columns

Storing large TEXT or BYTEA values (or any data that exceeds Postgres’s internal row size limit) in a table causes Postgres to "TOAST" (The Oversized-Attribute Storage Technique) the data, storing it out-of-line. Frequent updates to these TOASTed values can lead to bloat.

  • Diagnosis: Look at the toast_size column in the initial diagnostic query. If it’s a significant portion of the total_size, TOASTing is likely contributing to bloat.
  • Fix: Run VACUUM FULL your_table_name;. This rewrites the entire table, including TOASTed data, removing dead versions.
    VACUUM FULL (VERBOSE) your_table_name;
    
  • Why it works: VACUUM FULL is a more aggressive form of vacuuming. It locks the table exclusively, scans it, and writes a new copy of the table containing only live data, discarding all old row versions and TOASTed data. This is more resource-intensive and requires downtime.

Cause 4: Index Bloat

Indexes can also become bloated, especially after large DELETE operations. Dead index entries take up space and slow down index scans.

  • Diagnosis: Use pgstattuple extension. If not installed: CREATE EXTENSION pgstattuple;. Then run:
    SELECT index_வுகளில்.relname AS index_name,
           pg_size_pretty(pg_relation_size(index_வுகளில்.oid)) AS index_size,
           round(nullif(t.dead_tuple_count, 0) * 100.0 / t.tuple_count, 2) AS percent_dead_tuples
    FROM pg_stat_user_indexes AS index_stat
    JOIN pg_class AS index_வுகளில் ON index_வுகளில்.oid = index_stat.index_oid
    JOIN pg_stat_get_live_tuples(index_stat.relid) AS t ON true
    WHERE index_stat.relname = 'your_table_name'
    ORDER BY index_size DESC;
    
    This is a bit more complex; a simpler approach is to look at the size of indexes relative to the table size and consider pg_relation_size(index_oid) from pg_stat_user_indexes.
  • Fix: REINDEX INDEX index_name; or REINDEX TABLE your_table_name;. For a single index:
    REINDEX INDEX your_index_name;
    
    For all indexes on a table:
    REINDEX TABLE your_table_name;
    
  • Why it works: REINDEX rebuilds the index from scratch using the current data in the table. It effectively removes any dead index entries and reorganizes the index structure, reclaiming space and improving performance. This also requires a lock.

Cause 5: Long-Running Transactions

Transactions that remain open for a very long time can prevent VACUUM from cleaning up dead row versions that belong to older snapshots.

  • Diagnosis: Query pg_stat_activity for long-running transactions:
    SELECT pid, age(clock_timestamp(), query_start), usename, query
    FROM pg_stat_activity
    WHERE state != 'idle' AND query_start IS NOT NULL AND state_change < clock_timestamp() - INTERVAL '5 minutes'
    ORDER BY query_start;
    
  • Fix: Identify and terminate long-running transactions if they are not critical. Use pg_terminate_backend(pid);.
    SELECT pg_terminate_backend(pid); -- Replace pid with the actual process ID
    
  • Why it works: By ending long-running transactions, you allow VACUUM (and autovacuum) to see and clean up the dead row versions that were previously hidden from its view, thus reclaiming space.

Cause 6: VACUUM FULL is Too Costly

While VACUUM FULL is effective, it locks the table exclusively and can take a very long time on large tables, potentially causing significant downtime.

  • Diagnosis: This isn’t a diagnostic cause, but a consequence of the fix. If you’ve identified significant bloat and VACUUM FULL is the only apparent solution, but downtime is unacceptable, you need an alternative.
  • Fix: Use the pg_repack extension. It allows you to rebuild tables and indexes with minimal locking. You’ll need to install it: CREATE EXTENSION pg_repack;. Then run:
    pg_repack -d your_database_name -t your_table_name -j 4 --no-order --no-analyze
    
    (The command-line pg_repack tool is used here, though there’s also a SQL interface).
  • Why it works: pg_repack works by creating a new copy of the table and its indexes in the background, then swapping them in with only a very short lock at the end. This minimizes downtime compared to VACUUM FULL.

After addressing bloat, you’ll often find that your SELECT queries are significantly faster, and disk I/O has decreased. The next common issue you might encounter is related to index fragmentation after extensive data modification, or perhaps performance bottlenecks in specific query plans due to outdated statistics.

Want structured learning?

Take the full Postgres course →