RDS automated backups are a lifesaver, but most people don’t realize they’re not just a "set it and forget it" feature; they’re a critical component of your disaster recovery strategy that requires active management and verification.

Let’s see what a live backup and restore looks like. Imagine we have a PostgreSQL RDS instance named my-postgres-db.

# First, create a sample table and insert some data
aws rds-data execute-statement --resource-arn arn:aws:rds:us-east-1:123456789012:db:my-postgres-db --secret-arn arn:aws:secretsmanager:us-east-1:123456789012:secret:my-postgres-db-secret-abcdef --database postgres --sql "CREATE TABLE test_table (id INT, name VARCHAR(50));"
aws rds-data execute-statement --resource-arn arn:aws:rds:us-east-1:123456789012:db:my-postgres-db --secret-arn arn:aws:secretsmanager:us-east-1:123456789012:secret:my-postgres-db-secret-abcdef --database postgres --sql "INSERT INTO test_table (id, name) VALUES (1, 'initial data');"

# Now, let's simulate a failure and restore
# (In a real scenario, this would be an accidental DROP TABLE or data corruption)

# We'll restore to a new instance to avoid impacting the primary
aws rds restore-db-instance-to-point-in-time \
    --source-db-instance-identifier my-postgres-db \
    --target-db-instance-identifier my-postgres-db-restored \
    --restore-time "2023-10-27T10:30:00Z" \
    --db-instance-class db.t3.medium \
    --vpc-security-group-ids sg-0123456789abcdef0 \
    --db-subnet-group-name my-db-subnet-group \
    --tags Key=Purpose,Value=RestoreTest

This command initiates a restore operation. RDS takes your existing automated backups and transaction logs and reconstructs the database as it existed at the specified restore-time. The source-db-instance-identifier points to the original instance, and target-db-instance-identifier is the name for the new instance that will be created. We’re also specifying instance class, security groups, subnet group, and tags for the new instance, mimicking how you’d provision a new RDS instance.

The mental model for RDS automated backups hinges on two core components: the daily full snapshot and the transaction logs. Every day, RDS takes a full backup of your DB instance. In between these full backups, it continuously captures transaction logs (e.g., Write-Ahead Logs for PostgreSQL, Binary Logs for MySQL). When you perform a point-in-time restore, RDS first restores the most recent full snapshot taken before your desired restore time, and then it applies the transaction logs up to that exact moment, effectively replaying all committed transactions. This mechanism allows for granular restores down to the second within your defined backup retention period.

You control the retention period when you create or modify your RDS instance. This setting dictates how long RDS keeps your automated backups. A longer retention period provides more flexibility for point-in-time restores but incurs higher storage costs.

Here’s how you set or modify the retention period using the AWS CLI:

To set it during creation:

aws rds create-db-instance \
    --db-instance-identifier my-new-db \
    --db-instance-class db.t3.micro \
    --engine postgres \
    --master-username admin \
    --master-user-password YOUR_PASSWORD \
    --allocated-storage 20 \
    --backup-retention-period 14 \
    --db-subnet-group-name my-db-subnet-group \
    --vpc-security-group-ids sg-0123456789abcdef0

Here, --backup-retention-period 14 means RDS will retain automated backups for 14 days. The default is 1 day. You can set this to a maximum of 35 days.

To modify an existing instance:

aws rds modify-db-instance \
    --db-instance-identifier my-postgres-db \
    --backup-retention-period 30 \
    --apply-immediately

The --apply-immediately flag ensures the change takes effect without waiting for the next maintenance window.

Verifying your backups isn’t just about checking the retention period. It’s about testing the restore process. Regularly performing a point-in-time restore to a separate, temporary instance is crucial. This validates that your backups are not only being taken but are also restorable and that your applications can connect and function with the restored data. This practice catches issues like corrupted snapshots or incorrect configurations that might prevent a successful recovery when you need it most.

The most surprising thing about RDS automated backups is that the backup retention period is the only mechanism that controls the availability of point-in-time recovery. If you delete an instance without a final snapshot, or if the retention period expires, the ability to restore to a specific point in time is permanently lost. You can only restore from the latest available snapshot, which is typically taken daily.

After you’ve verified your automated backups and ensured point-in-time recovery is working, the next logical step is to understand how to manage manual snapshots for longer-term archival or specific recovery points outside the automated retention window.

Want structured learning?

Take the full Rds course →