Postgres’s TOAST mechanism isn’t just for "large values"; it’s a fundamental strategy for managing any value that might exceed a certain size, making your tables more efficient even if you don’t think you’re storing massive blobs.
Let’s see it in action. Imagine a simple table:
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title VARCHAR(255),
content TEXT
);
Now, let’s insert some data. First, a short document:
INSERT INTO documents (title, content) VALUES
('Short Note', 'This is a brief piece of text.');
Next, a much larger one:
INSERT INTO documents (title, content) VALUES
('Long Report', repeat('Lorem ipsum dolor sit amet, consectetur adipiscing elit. ', 1000));
When you SELECT * FROM documents;, Postgres retrieves both rows. For the "Short Note," the content is stored directly within the table’s main row data. For the "Long Report," however, something different happens. Postgres doesn’t store that massive string directly in the documents row. Instead, it creates a TOAST table (a separate, hidden table) and stores the large content there, replacing it in the main documents row with a pointer.
The problem TOAST solves is the fixed-size nature of database pages (typically 8KB). If a single row’s data, including all its columns, exceeds this limit, Postgres can’t store it efficiently. This would lead to rows being split across multiple pages, drastically slowing down reads and writes. TOAST allows Postgres to "chunk" large values into smaller pieces and store them off the main row, keeping your primary table data compact.
Here’s how it works internally:
- Compression: Before TOASTing, Postgres attempts to compress the large value. If compression significantly reduces its size (by at least 20% by default), the compressed version is stored.
- Chunking: If the value is still too large after compression (or if compression isn’t effective), it’s broken into smaller "chunks."
- TOAST Table: These chunks are stored in a separate, system-generated TOAST table associated with the original table. The original row then contains a pointer to the TOAST table and the specific TOAST entry.
- External Storage: Values that are TOASTed are not stored directly within the main table’s row data.
The toast_tuple_target setting in postgresql.conf (defaulting to 2KB) determines when a value starts to be considered for TOASTing. Values larger than this threshold are candidates. The autovacuum_toast_compression setting (defaulting to lz4) controls the compression algorithm used.
The content column in our documents table is of type TEXT. TEXT columns, along with BYTEA and JSON/JSONB, are the primary candidates for TOASTing. VARCHAR columns also have a length limit, and if a value exceeds that limit and the toast_tuple_target, it too can be TOASTed.
The one thing most people don’t realize is that TOAST isn’t just for explicitly massive data. Even if you never insert a gigabyte-sized file, if you have many columns in a table, and each column contains moderately large strings (say, 500 bytes each) for many rows, the cumulative size of a single row could exceed the toast_tuple_target. In such cases, Postgres will still TOAST those individual columns to keep the main row data small and efficient, even if no single piece of data seems "large" on its own.
The next thing you’ll likely encounter is understanding how TOAST affects vacuuming and storage bloat, especially with frequently updated TOASTed columns.