PostgreSQL’s replication slot mechanism is failing because the pg_replication_slots catalog table is missing the expected slot entry, indicating the slot was either never created, was dropped, or its metadata became corrupted.
Here’s a breakdown of common causes and how to fix them:
1. Slot Never Created or Dropped Manually
This is the most frequent culprit. The replication slot, essential for logical or physical replication, simply isn’t present in the system.
-
Diagnosis: On the primary server, run:
SELECT slot_name, plugin, slot_type, active FROM pg_replication_slots WHERE slot_name = 'your_slot_name';If this returns zero rows, the slot doesn’t exist.
-
Fix: Create the slot on the primary server. For logical replication, specify the plugin. For physical replication, omit the plugin.
- Logical Replication:
(ReplaceSELECT pg_create_logical_replication_slot('your_slot_name', 'pgoutput');'pgoutput'with your actual output plugin, e.g.,wal2jsonif you’re using that.) - Physical Replication:
SELECT pg_create_physical_replication_slot('your_slot_name');
- Logical Replication:
-
Why it works: This command directly registers the replication slot with PostgreSQL’s catalog, making it available for replication clients to connect to.
2. Slot Dropped by an Administrator or Automated Process
Someone or something might have intentionally removed the slot. This can happen during maintenance or if a cleanup script is misconfigured.
-
Diagnosis: Check the PostgreSQL logs on the primary server for
DROP_REPLICATION_SLOTcommands around the time the error started. You can also check thepg_replication_slotstable as described in Cause 1. -
Fix: If the slot was dropped accidentally and you need it, recreate it using the same
pg_create_logical_replication_slotorpg_create_physical_replication_slotcommands as in Cause 1. -
Why it works: Recreating the slot re-establishes the necessary catalog entry, allowing replication to resume.
3. Replication Slot Metadata Corruption
While rarer, the underlying files or catalog entries for a replication slot can become corrupted, leading PostgreSQL to believe it doesn’t exist or cannot be accessed.
-
Diagnosis: This is tricky. If
pg_replication_slotsis empty but you know you created slots, and logs don’t show drops, corruption is a possibility. You might see other related errors in the logs. There isn’t a direct command to diagnose corruption of a specific slot’s metadata, but the absence of expected slots after restarts or when logs show no activity points towards it. -
Fix: The most reliable fix for suspected corruption is to drop any remnants of the slot (if they appear in the catalog but are unusable) and recreate it.
- First, try to drop it explicitly:
SELECT pg_drop_replication_slot('your_slot_name'); - If that fails, you might need to manually remove the slot’s files from the
pg_wal/replication/directory on the primary server (this is a last resort and requires a PostgreSQL restart). Be extremely careful with this step. After manual removal or successful drop, recreate the slot as per Cause 1.
- First, try to drop it explicitly:
-
Why it works: Dropping and recreating the slot ensures that fresh, uncorrupted metadata is written to the catalog. Manual file removal bypasses the catalog entirely to clean up lingering state.
4. Replication Slot Not Present on the Correct Server
This error can occur if you’re checking for the slot on the standby server instead of the primary. Replication slots are a primary-side concept.
-
Diagnosis: Verify which PostgreSQL instance you are connected to when running the
pg_replication_slotsquery. Ensure you are connected to the primary server. -
Fix: Connect to the primary server and create the slot if it doesn’t exist there.
-
Why it works: Replication slots are state maintained by the primary server to track what WAL (Write-Ahead Log) data has been consumed by downstream consumers. Standby servers don’t manage these slots directly.
5. Incorrect Slot Name Used by Replication Client
The client application or tool attempting to connect might be using a slightly different slot name (e.g., case sensitivity issues, typos) than the one registered on the primary.
-
Diagnosis: Double-check the
slot_nameparameter in your replication client’s configuration against theslot_namereturned bySELECT slot_name FROM pg_replication_slots;on the primary. Pay attention to case. -
Fix: Correct the
slot_namein the replication client’s configuration to exactly match the name of an existing slot on the primary server. -
Why it works: The replication client must use the precise name of the slot that PostgreSQL recognizes to establish a connection and begin consuming WAL records.
6. PostgreSQL Service Restarted/Crashed Without Slot Persistence
In older PostgreSQL versions or specific configurations, replication slot information might not have been fully persisted or might have been lost during an unclean shutdown. Modern PostgreSQL versions are much better at this, but it’s still a possibility.
-
Diagnosis: Check the PostgreSQL logs for shutdown messages, crash reports, or recovery messages. Compare the
pg_replication_slotsoutput before and after a restart. -
Fix: Ensure your PostgreSQL is running a recent, supported version. If you are on an older version, consider upgrading. For current versions, the primary fix is to recreate the slot as per Cause 1, as the catalog should have been restored.
-
Why it works: A clean recovery process or a fresh creation of the slot ensures its presence in the catalog after the server has been restarted.
After resolving the "replication slot does not exist" error, the next common issue you might encounter is related to WAL segment management, such as "FATAL: could not receive data from WAL stream: ERROR: requested WAL segment … has already been removed."