Postgres logical decoding doesn’t just tell you when data changed, it tells you how it changed, making it a powerful tool for real-time data synchronization and auditing.
Let’s see it in action. Imagine you have a users table and you want to capture every insert, update, and delete operation in real-time.
-- Original table
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(100)
);
INSERT INTO users (username, email) VALUES ('alice', 'alice@example.com');
UPDATE users SET email = 'alice.smith@example.com' WHERE username = 'alice';
DELETE FROM users WHERE username = 'alice';
To capture these changes, we’ll set up a logical replication slot and a publisher.
First, configure postgresql.conf for logical decoding:
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10
Restart your PostgreSQL server after making these changes.
Next, create a publication for the table you want to track. A publication defines which tables’ data changes will be sent.
-- Create a publication that includes all changes for the 'users' table
CREATE PUBLICATION user_changes FOR TABLE users;
Now, you need a consumer to connect to this publication and read the changes. This is typically done using a separate process or application that uses the pgoutput output plugin (or others like wal2json). The consumer will create a logical replication slot.
-- This command is run by the consumer application/tool, not directly in psql
-- Example using psql to show existing slots (you'd use a client library to create one)
SELECT slot_name, plugin, slot_type, active FROM pg_replication_slots;
A consumer application would typically execute a command similar to this (conceptually, actual implementation varies):
-- This is a conceptual representation of what a client would do.
-- In reality, you'd use a library like libpq or a dedicated tool.
-- The consumer connects to the database and requests a slot.
-- Example: A Python script using psycopg2 might do:
-- conn.cursor().execute("SELECT pg_create_logical_replication_slot('my_user_slot', 'pgoutput');")
Once the slot is created and the consumer is connected, the consumer will receive a stream of change records. For the INSERT statement above, the pgoutput plugin might produce a record like this (simplified representation):
{
"change": [
{
"kind": "insert",
"schema": "public",
"table": "users",
"columnnames": ["id", "username", "email"],
"columntypes": ["integer", "character varying", "character varying"],
"columnvalues": [1, "alice", "alice@example.com"]
}
]
}
For the UPDATE, it would look like:
{
"change": [
{
"kind": "update",
"schema": "public",
"table": "users",
"columnnames": ["id", "username", "email"],
"columntypes": ["integer", "character varying", "character varying"],
"columnvalues": [1, "alice", "alice@example.com"]
}
]
}
Note that pgoutput by default sends the new row values. To get both old and new values for updates, you’d need to configure the publication with FOR ALL TABLES or specify addRowToo = true for the pgoutput plugin.
And for the DELETE:
{
"change": [
{
"kind": "delete",
"schema": "public",
"table": "users",
"columnnames": ["id", "username", "email"],
"columntypes": ["integer", "character varying", "character varying"],
"columnvalues": [1, "alice", "alice@example.com"]
}
]
}
The core problem logical decoding solves is providing a structured, row-level view of data modifications without the performance overhead or limitations of triggers or table diffing. It leverages the Write-Ahead Log (WAL) but decodes it into a user-friendly format.
The key levers you control are:
wal_level: Must be set tologicalto enable decoding.- Publications: Define what data changes are streamed. You can publish specific tables or all tables.
- Replication Slots: Act as cursors into the WAL stream. Each slot tracks its own position, ensuring that changes are not lost if the consumer disconnects.
- Output Plugins: Determine the format of the decoded data (e.g.,
pgoutputfor a structured format,wal2jsonfor JSON).
A critical detail often missed is how replication slots manage WAL cleanup. If a logical replication slot is not consumed, the WAL files associated with it will not be removed by PostgreSQL, potentially leading to disk space exhaustion. You must ensure that consumers are actively processing their slots or that stale slots are dropped.
The next step is understanding how to handle different output plugins and their specific configurations for capturing more detailed change information.