The RDS database instance failed to accept new connections because the max_connections limit was reached, preventing the application from performing its work.
This usually happens because the application is either opening too many connections and not closing them, or the max_connections setting is simply too low for the workload.
Cause 1: Application Leaking Connections
- Diagnosis: Check your application logs for frequent "too many connections" errors. Use a monitoring tool like Datadog, New Relic, or CloudWatch to visualize the
RDS Connectionsmetric. If the number of active connections consistently hovers near themax_connectionslimit, and spikes rapidly, it’s a strong indicator of a leak.# Example using AWS CLI to get current connection count aws rds describe-db-connections --db-instance-identifier your-db-instance-name --query 'Connections[*].DBConnectionId' --output text | wc -l - Fix: Review your application’s connection pooling configuration. Ensure connections are properly closed or returned to the pool when no longer needed. For example, in Java with HikariCP, ensure
connection.close()is called within atry-finallyblock or use try-with-resources. In Python withpsycopg2, ensure connections are closed (conn.close()) and cursors are closed (cur.close()). - Why it works: By explicitly closing or returning connections to the pool, you reduce the number of actively held connections, preventing the
max_connectionslimit from being hit due to unreleased resources.
Cause 2: Insufficient max_connections Setting
- Diagnosis: Analyze the
RDS Connectionsmetric in CloudWatch. If the average number of connections is consistently high, approaching themax_connectionslimit, and there aren’t obvious connection leaks, the limit itself is likely too low for your application’s normal operation. You can also check themax_connectionsparameter directly:-- Connect to your RDS instance using psql, mysql client, etc. SHOW max_connections; - Fix: Increase the
max_connectionsparameter in your RDS instance’s parameter group. The optimal value depends on your instance class and workload. A common starting point for larger instances (e.g.,db.r5.largeor higher) is often between 200 and 500, but this can go much higher for very busy systems.- Go to RDS console -> Parameter groups.
- Select your DB instance’s parameter group or create a new one.
- Click "Edit parameters".
- Search for
max_connections. - Change the value (e.g., from 100 to 300).
- Save changes.
- Important: You must reboot your DB instance for the change to take effect.
- Why it works: Increasing the
max_connectionsparameter allows more simultaneous connections to be established to the database, accommodating your application’s normal demand.
Cause 3: High Number of Idle Connections
- Diagnosis: Even if your application claims to close connections, it might be leaving them in an
idlestate for extended periods. Monitor theRDS Connectionsmetric and look at the breakdown of active vs. idle connections if your monitoring tool supports it. You can also query the database directly:-- For PostgreSQL SELECT count(*) FROM pg_stat_activity WHERE state = 'idle'; -- For MySQL SHOW PROCESSLIST; -- Look for 'Sleep' state SELECT count(*) FROM information_schema.processlist WHERE command = 'Sleep'; - Fix: Configure your application’s connection pool to have a lower
idle_timeout. For example, in HikariCP, setidleTimeout(e.g., to 30000 ms or 30 seconds). For other pools, consult their documentation. Also, consider setting await_timeoutorinteractive_timeouton the database side if applicable (thoughmax_connectionsis the primary RDS limit). - Why it works: By reducing the time connections can remain idle in the pool or on the database, you free up connection slots that are no longer actively being used, making them available for new requests.
Cause 4: Too Many Application Instances/Threads
- Diagnosis: If you’ve recently scaled up your application tier (e.g., added more EC2 instances, increased the number of containers, or increased thread counts per instance) without adjusting database connection limits or pooling, you might be overwhelming the database. Correlate spikes in
RDS Connectionswith application scaling events. - Fix: Adjust your application’s connection pool size. Each application instance/thread should ideally share a connection pool. Ensure the total number of connections requested by all application instances at peak load does not exceed the
max_connectionssetting. You might need to reduce themaximumPoolSizein your connection pool configuration or scale down the application tier if the database is the bottleneck. - Why it works: By controlling the number of connections each application instance can open, and ensuring the sum of these potential connections across all instances is less than
max_connections, you prevent the database from being swamped by too many clients.
Cause 5: Database Parameter Group Not Applied/Rebooted
- Diagnosis: You’ve modified
max_connectionsin a parameter group, but the connections metric hasn’t changed, and theSHOW max_connections;command returns the old value. - Fix: Ensure the correct parameter group is associated with your RDS instance. Then, reboot the RDS instance for parameter changes to take effect. This is a crucial step that’s often overlooked.
- Why it works: RDS applies many dynamic parameters immediately, but
max_connectionsis a static parameter that requires an instance reboot to reinitialize the database process with the new limit.
Cause 6: Monitoring Lag or Incorrect Metric Interpretation
- Diagnosis: You’re seeing connection errors, but the
RDS Connectionsmetric in CloudWatch shows a much lower number than yourmax_connectionslimit. This can happen if there’s a significant delay in metric reporting or if you’re looking at average connections over a long period instead of peak connections. - Fix: Adjust your CloudWatch metric resolution to
1 minutefor more granular data. Examine theMaximumstatistic for theRDS Connectionsmetric over short intervals (e.g., 5-15 minutes) to catch brief spikes. Also, cross-reference with theAurora Connections(if applicable) or directly querypg_stat_activityorinformation_schema.processlistfor near real-time counts. - Why it works: By using higher resolution and focusing on peak values, you can accurately identify connection bursts that might be missed by default or averaged metrics, allowing you to pinpoint the exact moment the limit was hit.
The next error you’ll likely encounter after fixing max_connections is a slow query issue if the database is now overloaded with too many active (not just open) queries.