Postgres doesn’t actually write your data to disk immediately; it writes it to a log first, and that log is the key to everything.
Let’s see this in action. Imagine you’re running a small e-commerce site.
-- Client 1: Start a transaction and update an order
BEGIN;
UPDATE orders SET status = 'shipped' WHERE order_id = 12345;
-- At this point, the change hasn't hit the main data files yet.
Here’s what’s happening under the hood. When that UPDATE statement is issued, Postgres doesn’t immediately go find the orders table’s data page on disk and modify it. Instead, it first writes a description of that change to a special, append-only file called a Write-Ahead Log (WAL) record. This WAL record contains enough information to reconstruct the change later.
-- WAL Record Snippet (conceptual, not actual output)
{
transaction_id: 1234567,
operation: UPDATE,
table_oid: 16384, -- oid for 'orders' table
page_lsn: 1/A345678, -- Location within the WAL file
old_data: { ... }, -- Previous state of the row
new_data: { status: 'shipped', ... } -- New state of the row
}
Only after this WAL record is guaranteed to be written to stable storage (like an SSD or HDD) does Postgres consider the transaction "committed" from a durability perspective. The actual modification to the data pages in memory (the buffer cache) happens concurrently or soon after. A background process called the "writer" process will eventually flush these modified data pages from memory to their permanent locations on disk.
This WAL mechanism is what provides Postgres’s ACID guarantees: Atomicity, Consistency, Isolation, and Durability.
- Atomicity: If a transaction fails mid-way, Postgres can look at the WAL records for that transaction. If it’s incomplete, it simply discards the changes. If it’s complete but didn’t make it to the data files, it can replay the WAL records to apply the changes.
- Consistency: WAL ensures that a transaction either fully completes or doesn’t happen at all, maintaining the database’s integrity.
- Isolation: While not directly managed by WAL, the logging mechanism supports MVCC (Multi-Version Concurrency Control), which is how Postgres achieves isolation.
- Durability: This is WAL’s primary strength. Once a WAL record is written to disk, the change is permanent, even if the server crashes immediately afterward.
The WAL files themselves are managed in segments. Postgres writes to a current WAL file until it reaches a certain size (typically 16MB by default), then it "archives" that file and starts a new one. This creates a historical sequence of all changes made to the database.
# Example WAL file structure on disk
/var/lib/postgresql/data/pg_wal/000000010000000000000001 -- Current WAL file
/var/lib/postgresql/data/pg_wal/000000010000000000000000 -- Previous WAL file (might be archived)
This sequence is crucial for recovery. If the database crashes, Postgres will:
- Start up and identify the last checkpoint. A checkpoint is a point in time where all data pages modified before it are guaranteed to be on disk.
- Scan WAL files starting from the end of the checkpoint.
- Replay all WAL records from that point forward to bring the data files up to the state they were in just before the crash.
The configuration parameters wal_level, fsync, synchronous_commit, and wal_buffers are your main levers for controlling WAL behavior. wal_level determines how much information is written to WAL. fsync (if on) ensures WAL records are physically written to disk before the OS reports success. synchronous_commit controls when a transaction commit is reported back to the client relative to WAL writes. wal_buffers is the amount of memory dedicated to buffering WAL records before they are written to disk.
The most surprising thing about WAL is how it enables features like point-in-time recovery (PITR) and streaming replication. By continuously archiving WAL files, you can essentially rewind your database to any specific moment in time or stream live changes to a replica server. This means the WAL isn’t just for crash recovery; it’s the fundamental mechanism for data availability and disaster recovery.
Understanding WAL is key to grasping Postgres’s performance tuning, replication strategies, and recovery procedures. The next logical step is to dive into how pg_basebackup and streaming replication leverage these WAL archives to create and maintain standby servers.