Postgres is terminating connections because the pg_terminate_backend() function is being called with an invalid process ID (PID).

This usually happens when an automated process or a manual script attempts to kill a PostgreSQL backend process that no longer exists or has already been terminated by another mechanism. The pg_terminate_backend() function expects a valid, active backend PID. When it receives an invalid PID, PostgreSQL’s internal logic flags this as an abnormal termination and logs an error, often resulting in the client connection associated with that non-existent backend also being dropped.

Here are the most common reasons this occurs and how to address them:

Stale PID in Automation Scripts

Diagnosis: You’ll find a script (e.g., a shell script, Python script, or cron job) that periodically checks for long-running queries or idle connections and attempts to terminate them using pg_terminate_backend(). The PID it’s trying to terminate is no longer valid.

Common Cause: The script’s logic for fetching PIDs is flawed, or a race condition exists where a backend process finishes its work between the time the PID is fetched and the time pg_terminate_backend() is called.

Fix:

  1. Implement a "check-before-kill" mechanism: Before calling pg_terminate_backend(pid), query pg_stat_activity to ensure the PID still exists and matches the expected query or state.

    SELECT pid, usename, datname, query, state
    FROM pg_stat_activity
    WHERE pid = <the_pid_you_intend_to_kill>;
    
  2. Add error handling: If the query above returns no rows, or if the query or state doesn’t match what you expect, do not call pg_terminate_backend(). Log this condition as a warning instead of an error.

    Why it works: This prevents the function from being called with a non-existent PID, thus avoiding the "invalid PID" error that triggers the connection termination.

External Process Management (e.g., Docker, Kubernetes)

Diagnosis: If your PostgreSQL instance is running within a containerized environment, the container orchestrator might be terminating PostgreSQL processes directly without PostgreSQL being aware. When pg_terminate_backend() is then called on a PID that the orchestrator already killed, you get this error.

Common Cause: The orchestrator’s health checks or scaling events might trigger container restarts or process termination. The PostgreSQL server might not have graceful shutdown procedures configured or triggered correctly.

Fix:

  1. Configure graceful shutdown: Ensure your container orchestrator sends appropriate termination signals (like SIGTERM) to the PostgreSQL container. Configure PostgreSQL to handle these signals by initiating a clean shutdown. This typically involves setting SIGTERM to smart_shutdown or fast_shutdown in your orchestrator’s deployment configuration.

  2. Avoid direct PID killing from orchestrator: If possible, let PostgreSQL manage its own processes. If you must intervene, ensure your intervention logic is robust and checks for process existence.

    Why it works: Graceful shutdown allows PostgreSQL to clean up connections and processes properly before the container is stopped, preventing orphaned PIDs that external scripts might try to kill later.

Manual Intervention Errors

Diagnosis: A DBA or developer manually ran a SELECT pg_terminate_backend(<pid>); command, but provided a PID that was already gone or incorrect.

Common Cause: Typo in the PID, or the PID was for a process that finished its work just before the command was executed.

Fix:

  1. Double-check PIDs: Always verify the PID from pg_stat_activity immediately before executing pg_terminate_backend().

  2. Use pg_stat_activity to find PIDs: Instead of guessing or recalling PIDs, always query pg_stat_activity to get the current, active PIDs.

    -- Example: Find a long-running query and its PID
    SELECT pid, usename, query, state, now() - query_start AS duration
    FROM pg_stat_activity
    WHERE state = 'active' AND query NOT LIKE '%pg_stat_activity%'
    ORDER BY duration DESC
    LIMIT 1;
    

    Then, use the pid from the result.

    Why it works: Ensures the pg_terminate_backend() command is always executed with a PID that is currently active and managed by PostgreSQL.

Background Worker Processes

Diagnosis: PostgreSQL uses background worker processes for various tasks (e.g., autovacuum, logical replication workers). If a script or external tool tries to terminate a background worker PID that has already exited or been replaced, this error can occur.

Common Cause: Scripts designed to manage user-submitted queries might incorrectly target system background processes.

Fix:

  1. Filter background workers: When selecting PIDs to terminate, explicitly exclude background workers. You can identify them by checking the backend_type column in pg_stat_activity.

    SELECT pid, usename, datname, query, state, backend_type
    FROM pg_stat_activity
    WHERE pid = <the_pid_you_intend_to_kill> AND backend_type = 'client backend';
    
  2. Be cautious with system processes: Avoid terminating processes that are not directly associated with user queries unless you have a very specific and well-understood reason.

    Why it works: This prevents attempts to terminate processes that are managed internally by PostgreSQL and might have different lifecycles than regular client backends.

System Resource Exhaustion Leading to Process Death

Diagnosis: The operating system might be killing PostgreSQL backend processes due to memory pressure (OOM killer) or other resource constraints. If a script later tries to terminate one of these OOM-killed PIDs, you’ll get the "invalid PID" error.

Common Cause: Insufficient RAM on the server, or a specific query consuming excessive memory.

Fix:

  1. Monitor system resources: Regularly check dmesg or /var/log/syslog for OOM killer messages.

  2. Tune PostgreSQL memory settings: Adjust shared_buffers, work_mem, and maintenance_work_mem based on your server’s RAM.

  3. Optimize queries: Identify and optimize memory-hungry queries.

    Why it works: By preventing processes from being killed by the OS, you ensure that any PIDs PostgreSQL is aware of are actually alive and managed by the database.

Network Interruption/Client Disconnects

Diagnosis: A client application might have crashed or lost its network connection. PostgreSQL might still have a backend process associated with that client for a short period. If an automated cleanup script targets this PID, and the OS or PostgreSQL has already cleaned up the actual process, you can see this.

Common Cause: Unstable network, client application bugs, or aggressive client-side connection pooling that doesn’t properly notify the server on disconnect.

Fix:

  1. Implement timeouts on the server: Use idle_in_transaction_session_timeout and statement_timeout to automatically clean up sessions that are stuck or have been idle for too long.

  2. Ensure proper client disconnect handling: Configure client connection pools to send DISCARD SEQUENCES or RESET commands before closing connections, and ensure they properly handle network errors.

    Why it works: Server-side timeouts actively manage idle or stuck connections, reducing the chance that an external script will attempt to terminate a PID for a client that has already effectively disconnected.

After addressing these, the next error you might encounter is related to the actual resource that was causing the long-running query or connection leak in the first place, or potentially issues with replication if background workers were involved.

Want structured learning?

Take the full Postgres course →