The RDS instance is refusing new connections because it’s choked by a single, runaway SQL query that has locked essential system tables.

Common Causes and Fixes

1. Accidental Infinite Loop or Massive Data Scan

  • Diagnosis: Connect to your RDS instance using a client that can show active queries (like psql for PostgreSQL or MySQL Workbench for MySQL). Look for queries that have been running for an unusually long time (minutes, hours, or days) and are consuming significant CPU or I/O. For PostgreSQL, SELECT * FROM pg_stat_activity WHERE state = 'active' AND query NOT ILIKE '%pg_stat_activity%'; is your friend. For MySQL, SHOW FULL PROCESSLIST; is the command.
  • Fix: Identify the process_id (MySQL) or pid (PostgreSQL) of the offending query.
    • PostgreSQL: SELECT pg_terminate_backend(<pid>);
    • MySQL: KILL <process_id>;
  • Why it works: These commands send a signal to the database server to stop processing the specified query and release any locks it holds.

2. Application Bug: Missing WHERE Clause in UPDATE or DELETE

  • Diagnosis: As above, find the long-running query. If it’s an UPDATE or DELETE statement that appears to be processing an impossibly large number of rows (e.g., UPDATE my_table SET status = 'processed'), it’s likely missing a WHERE clause. The pg_stat_activity or SHOW PROCESSLIST output will show the exact SQL statement.
  • Fix:
    • PostgreSQL: SELECT pg_terminate_backend(<pid>);
    • MySQL: KILL <process_id>;
    • Then, immediately fix the application code to include the missing WHERE clause. For example, change UPDATE my_table SET status = 'processed'; to UPDATE my_table SET status = 'processed' WHERE id = 123;.
  • Why it works: The KILL command stops the runaway operation. Fixing the application code prevents the same issue from recurring.

3. Deadlock on Critical Tables

  • Diagnosis: Long-running queries can sometimes be the result of a deadlock, where two or more transactions are waiting for each other to release locks. Check your database logs for "deadlock detected" messages. In PostgreSQL, you might see this in pg_stat_activity as queries stuck in a waiting state, often with wait_event_type and wait_event fields indicating lock contention. For MySQL, SHOW ENGINE INNODB STATUS; will often reveal deadlock information in the LATEST DETECTED DEADLOCK section.
  • Fix:
    • PostgreSQL: SELECT pg_terminate_backend(<pid>); (identify the PID involved in the deadlock).
    • MySQL: KILL <process_id>; (identify the process ID involved in the deadlock).
    • Application Level: Review transaction isolation levels and the order of operations in your application to avoid acquiring locks in conflicting orders.
  • Why it works: Terminating one of the participants in a deadlock allows the other(s) to proceed. Addressing the application logic prevents future deadlocks.

4. Excessive Index Rebuilding or Table Rewriting

  • Diagnosis: Sometimes, maintenance operations like VACUUM FULL (PostgreSQL) or OPTIMIZE TABLE (MySQL), or even certain types of index rebuilds, can manifest as very long-running queries that lock tables. These operations often rewrite entire tables or indexes, which can take a considerable amount of time and resources. Check pg_stat_activity for queries mentioning VACUUM or REINDEX, or SHOW PROCESSLIST for OPTIMIZE TABLE.
  • Fix:
    • PostgreSQL: If it’s VACUUM FULL or REINDEX, these are generally difficult to interrupt gracefully. The safest bet is often to SELECT pg_terminate_backend(<pid>);. Be aware this can leave the table in an inconsistent state requiring a manual VACUUM or REINDEX.
    • MySQL: KILL <process_id>; for OPTIMIZE TABLE.
    • Prevention: Schedule such maintenance during low-traffic periods or use alternative, non-blocking maintenance strategies (e.g., VACUUM without FULL in PostgreSQL, online DDL for some operations in MySQL).
  • Why it works: The KILL command stops the blocking operation. Understanding and avoiding long-running, blocking maintenance is key.

5. Large Transaction Holding Locks

  • Diagnosis: A transaction that started long ago and has yet to be committed or rolled back can hold locks on many rows or even entire tables. In pg_stat_activity, look for queries with a state of idle in transaction or idle in transaction (aborted) and a backend_start time from long ago. In MySQL’s SHOW PROCESSLIST, look for connections with a Command of Sleep but a high Time value, and check if they have active transactions (SHOW ENGINE INNODB STATUS;).
  • Fix:
    • PostgreSQL: SELECT pg_terminate_backend(<pid>); for the idle in transaction session.
    • MySQL: KILL <process_id>; for the sleeping connection.
    • Application Level: Ensure your application code explicitly commits or rolls back transactions, and implement connection pooling with aggressive timeouts to prevent stale transactions from lingering.
  • Why it works: Terminating the session forces the database to roll back any uncommitted transaction, releasing its locks.

6. Resource Exhaustion (Less Common as a Direct Cause of a Single Long Query, but Contributory)

  • Diagnosis: While not typically the root cause of a single query running forever, if your RDS instance is under extreme CPU, memory, or I/O pressure, even normal queries can slow to a crawl, appearing "long-running." Monitor CloudWatch metrics for CPUUtilization, FreeableMemory, ReadIOPS, and WriteIOPS.
  • Fix:
    • Scale Up: Increase the instance class (e.g., from db.t3.medium to db.m5.large).
    • Scale Out: For read-heavy workloads, add read replicas.
    • Optimize Queries: Analyze query plans (EXPLAIN) for the long-running query and others to find missing indexes or inefficient operations.
  • Why it works: More resources allow queries to complete faster. Optimized queries use resources more efficiently.

After resolving the immediate long-running query, you’ll likely encounter Connection timed out or Too many connections errors if the initial problem caused a backlog of connection attempts that are now trying to get through simultaneously.

Want structured learning?

Take the full Rds course →