RDS instances can experience performance degradation due to inefficient database queries.

Let’s see how slow queries manifest and how to get a handle on them.

Imagine a user trying to load their profile page, and it takes 30 seconds. On the backend, a database query that should be lightning-fast is taking ages. This is usually due to a poorly written query, missing indexes, or an overloaded database. The RDS Slow Query Log is your first line of defense, pinpointing these performance bottlenecks.

Here’s a simplified view of what happens when a slow query log is active:

  1. Query Execution: A database query runs against your RDS instance.
  2. Timing: The database engine measures how long that query took.
  3. Threshold Check: If the query’s execution time exceeds a predefined threshold (e.g., 1 second), it’s considered "slow."
  4. Logging: The details of this slow query (SQL statement, execution time, user, etc.) are written to a specific log file.
  5. Analysis: You then access and analyze this log file to identify problematic queries.

Let’s get this set up.

Enabling the Slow Query Log in RDS

You enable the slow query log by modifying your RDS instance’s parameter group.

  1. Navigate to Parameter Groups: In the AWS Management Console, go to RDS -> Parameter Groups.
  2. Select or Create a Parameter Group: Choose the parameter group associated with your RDS instance. If you’re using the default, it’s highly recommended to create a custom parameter group to avoid affecting other default settings. Click "Create parameter group." Select the correct database engine family (e.g., aurora-postgresql or mysql8.0) and version.
  3. Modify Parameters:
    • For MySQL/MariaDB:
      • Search for slow_query_log. Set its value to 1 (enable).
      • Search for long_query_time. Set its value to 1 (seconds). This is the threshold for what constitutes a "slow" query. You can adjust this based on your application’s needs, but 1 is a common starting point.
      • Search for log_output. Set its value to FILE. This ensures logs are written to a file within the RDS instance.
      • Search for general_log. Crucially, ensure general_log is set to 0 (disabled). The general log captures all queries, which can be extremely verbose and resource-intensive. We only want slow queries.
    • For PostgreSQL:
      • Search for log_min_duration_statement. Set its value to 1000 (milliseconds). This parameter logs any statement that takes longer than the specified duration. So, 1000 means 1 second.
      • Search for log_statement. Set this to none if you only want slow queries logged. If you need more detailed logging for debugging, you might set it to all or ddl, but for performance analysis, none is best to keep the logs focused.
      • Search for log_output. Set this to file.
  4. Save Changes: Click "Save changes."
  5. Apply to RDS Instance:
    • Go back to your RDS instances.
    • Select your instance and click "Modify."
    • Under "Database options," select your newly modified parameter group.
    • Click "Continue" and then "Apply immediately" (or schedule for a maintenance window if preferred).
    • Note: Modifying parameter groups often requires an instance reboot for changes to take effect. RDS will usually prompt you if a reboot is needed.

Accessing and Analyzing the Slow Query Log

Once enabled, the log file will start accumulating slow query data.

  • For MySQL/MariaDB: The slow query log is typically found at /rdsdbdata/log/slowquery.log. You can access it using the RDS Log Retrieval API or the AWS CLI.

    Example using AWS CLI:

    aws rds download-db-log-file-records --db-instance-identifier your-db-instance-id --log-file-name slowquery.log --output text --query 'LogFileContent' > slowquery.log
    

    Replace your-db-instance-id with your actual RDS instance identifier.

    Analyzing with pt-query-digest (Percona Toolkit): This is a powerful command-line tool for analyzing MySQL slow query logs. You’ll need to install Percona Toolkit on a machine where you can access the log file (e.g., your local machine or an EC2 instance).

    pt-query-digest slowquery.log > slowquery_analysis.txt
    

    The slowquery_analysis.txt file will provide a summary of the slowest queries, their frequency, and aggregated statistics. Look for queries with high Total Latency or Calls that contribute significantly to overall execution time.

  • For PostgreSQL: The slow query log file is named postgresql.log by default, and slow queries are logged within it if log_min_duration_statement is set. You can access it similarly to MySQL logs.

    Example using AWS CLI:

    aws rds download-db-log-file-records --db-instance-identifier your-db-instance-id --log-file-name postgresql.log --output text --query 'LogFileContent' > postgresql.log
    

    Analyzing PostgreSQL logs: You can use standard text processing tools like grep and awk, or more specialized tools if you export the logs to a format that can be parsed by query analysis tools.

    Example using grep to find specific slow queries:

    grep "duration:" postgresql.log | sort -k 10 -n -r | head -n 20
    

    This command looks for lines containing "duration:", sorts them numerically by the duration (the 10th field, assuming a standard log format), and shows the top 20 longest-running statements.

Common Slow Query Patterns and Fixes

  1. Missing Indexes:

    • Diagnosis: pt-query-digest (for MySQL) will often show queries with high EXPLAIN output indicating full table scans. For PostgreSQL, look for Seq Scan in EXPLAIN ANALYZE output.
    • Fix (MySQL): Add an index to the columns used in WHERE, JOIN, or ORDER BY clauses.
      CREATE INDEX idx_column_name ON your_table (column_name);
      
    • Fix (PostgreSQL):
      CREATE INDEX idx_column_name ON your_table (column_name);
      
    • Why it works: Indexes are like a book’s index, allowing the database to quickly locate specific rows without scanning the entire table.
  2. Inefficient JOIN Clauses:

    • Diagnosis: Queries joining multiple large tables without proper conditions or indexes on join columns.
    • Fix: Ensure join conditions use indexed columns. Review the JOIN strategy; sometimes, rewriting the query to join fewer tables or in a different order can help.
    • Why it works: Well-indexed join columns drastically reduce the number of row comparisons the database needs to perform.
  3. SELECT * in Production:

    • Diagnosis: Queries that fetch all columns when only a few are needed. This can be identified by examining the pt-query-digest output or by reviewing the SQL statements themselves.
    • Fix: Explicitly list the columns you need.
      SELECT column1, column2 FROM your_table WHERE ...;
      
    • Why it works: Reduces the amount of data read from disk and transferred over the network.
  4. Subqueries that Execute for Every Row:

    • Diagnosis: Correlated subqueries that run once for each row processed by the outer query. pt-query-digest might highlight these, or manual SQL review.
    • Fix: Rewrite the subquery as a JOIN or use Common Table Expressions (CTEs).
    • Why it works: Joins and CTEs allow the database to optimize the execution plan for the subquery’s logic once, rather than repeatedly.
  5. ORDER BY Without Index:

    • Diagnosis: Queries using ORDER BY on columns that are not indexed. The database might perform a costly sort operation.
    • Fix: Add an index that includes the columns used in the ORDER BY clause.
      CREATE INDEX idx_table_col1_col2 ON your_table (column1, column2);
      
    • Why it works: An index can often provide data in the desired sorted order, eliminating the need for a separate sort step.
  6. Large OFFSET in Pagination:

    • Diagnosis: Queries with LIMIT and a large OFFSET (e.g., LIMIT 10 OFFSET 10000). The database still has to fetch and discard all rows up to the offset.
    • Fix: Implement keyset pagination (also known as "seek method") where you use the values from the last row of the previous page to fetch the next set.
      -- Instead of: SELECT * FROM items ORDER BY id LIMIT 10 OFFSET 10000;
      -- Use:
      SELECT * FROM items WHERE id > 10000 ORDER BY id LIMIT 10;
      
    • Why it works: This method avoids scanning and discarding a large number of rows, making subsequent page fetches much faster.

The next potential issue you might encounter is understanding how RDS handles parameter group reboots and applying changes without downtime.

Want structured learning?

Take the full Rds course →