Postgres replication is how you keep copies of your database in sync, but the two main methods, Streaming and Logical, are fundamentally different beasts, and you’re probably using the wrong one.
Let’s see Streaming Replication in action. Imagine a primary Postgres server and a replica. When a transaction commits on the primary, its Write-Ahead Log (WAL) records are immediately sent to the replica. The replica then applies these WAL records to its own data files, bringing it up-to-date.
// Primary Server: postgresql.conf
wal_level = replica
max_wal_senders = 3
wal_keep_size = 2048 # MB - Keep at least 2GB of WAL files on disk
// Replica Server: postgresql.conf
hot_standby = on # Allows read queries on the replica
// Replica Server: recovery.conf (Postgres < 12) or postgresql.conf (Postgres >= 12)
primary_conninfo = 'host=primary.example.com port=5432 user=replication_user password=secret'
restore_command = 'cp /path/to/wal/%f %p' # For file-based log shipping, less common with streaming
The primary benefit here is near real-time data availability on the replica. This is crucial for high availability (HA) setups where a quick failover to a replica is paramount. The replica is an exact byte-for-byte copy of the primary. You can’t selectively replicate tables or transform data.
Now, let’s look at Logical Replication. This isn’t about sending WAL records directly. Instead, Postgres decodes the WAL into a series of logical changes (INSERTs, UPDATEs, DELETEs) and publishes them. Subscribers then consume these changes and apply them to their own databases.
// Primary Server: postgresql.conf
wal_level = logical
max_replication_slots = 1
max_wal_senders = 3
On the publisher:
-- Create a publication for specific tables
CREATE PUBLICATION my_publication FOR TABLE users, orders;
On the subscriber:
-- Create a subscription to receive changes from the publication
CREATE SUBSCRIPTION my_subscription CONNECTION 'host=publisher.example.com port=5432 user=replication_user password=secret dbname=publisher_db' PUBLICATION my_publication;
Logical replication shines when you need more granular control. You can replicate specific tables, transform data on the fly (though this requires more complex setup outside the core replication), or even replicate between different major versions of Postgres. It’s the go-to for selective data sharing, migrating data to a new schema, or building data warehouses where only certain datasets are needed. The subscriber doesn’t have to be an exact replica; it can have a different schema.
The most surprising thing about logical replication is that a single transaction on the publisher can result in multiple transactions on the subscriber, and the order of operations across different publications is not guaranteed to be the same as the publisher.
The real power of logical replication comes from its ability to decouple the replication process from the physical storage of the database. It operates at a higher level of abstraction, dealing with the "what" of data changes rather than the "how" they are physically written. This allows for much greater flexibility in how and where data is replicated.
While streaming replication is about creating a hot standby for failover and read scaling with an identical copy, logical replication is about distributing specific data changes to potentially different environments for selective use.
The next logical step is understanding how to manage replication slots and monitor replication lag to ensure data consistency.