Migrating to RDS with AWS DMS is surprisingly less about the data itself and more about managing the transient state of your database connections.
Let’s watch it happen. Imagine a source PostgreSQL database and a target RDS PostgreSQL instance. We’ll use AWS Database Migration Service (DMS) to shuttle data.
First, the setup:
Source Endpoint (PostgreSQL on EC2)
{
"EndpointIdentifier": "my-source-pg",
"EndpointType": "source",
"Engine": "postgres",
"Username": "pguser",
"Password": "pgpassword",
"ServerName": "ec2-xx-xx-xx-xx.compute-1.amazonaws.com",
"Port": 5432,
"DatabaseName": "mydatabase",
"ExtraConnectionAttributes": "sslmode=require,replication_slot_create_if_not_exists=true"
}
Target Endpoint (RDS PostgreSQL)
{
"EndpointIdentifier": "my-target-rds",
"EndpointType": "target",
"Engine": "postgres",
"Username": "rdsuser",
"Password": "rdspassword",
"ServerName": "my-rds-instance.xxxxxxxxxxxx.rds.amazonaws.com",
"Port": 5432,
"DatabaseName": "mydatabase",
"ExtraConnectionAttributes": "sslmode=require"
}
Replication Instance
We provision a dms.t3.medium instance in the same VPC as our source and target databases. This instance is the workhorse, processing and transferring data.
Replication Task
This is where the magic (and complexity) happens. We define how data moves.
{
"ReplicationTaskIdentifier": "my-pg-migration-task",
"SourceEndpointArn": "arn:aws:dms:us-east-1:123456789012:endpoint:my-source-pg",
"TargetEndpointArn": "arn:aws:dms:us-east-1:123456789012:endpoint:my-target-rds",
"ReplicationInstanceArn": "arn:aws:dms:us-east-1:123456789012:rep:my-replication-instance",
"MigrationType": "full-load-and-cdc",
"TaskSettings": {
"Logging": {
"EnableLogging": true,
"IncludeExternalTables": false,
"IncludeShemaChanges": false
},
"TargetTablePrepMode": "TRUNCATE_BEFORE_LOAD",
"ErrorBehavior": {
"ErrorEmailAddress": "myemail@example.com",
"FullLoadExceptionsTableName": "dms_full_load_exceptions",
"ApplyErrorPolicies": "CONTINUE",
"DiscardInternalChanges": false
},
"ValidationEnabled": true
},
"TableMappings": {
"Rules": [
{
"RuleType": "selection",
"ObjectQualifier": "public",
"SchemaName": "public",
"TableName": "%",
"RuleAction": "include"
}
]
}
}
This task will first perform a full load of all data from the public schema of the source to the target, then capture and apply ongoing changes (Change Data Capture - CDC).
The problem DMS solves is enabling heterogeneous or homogeneous database migrations with minimal downtime. It handles the complexities of schema conversion (though for PostgreSQL to PostgreSQL it’s less of an issue), data transformation, and continuous replication. The core mechanism involves:
- Full Load: DMS reads data from the source tables, often in parallel for large tables, and writes it to the target.
- Change Data Capture (CDC): For ongoing replication, DMS uses database-specific methods (like PostgreSQL’s logical replication slots) to capture transaction logs. It then applies these changes to the target database.
The MigrationType is crucial:
full-load: Just one-time data transfer.cdc: Only captures and applies ongoing changes (requires a pre-existing, in-sync target).full-load-and-cdc: The most common for migrations, performs the full load and then switches to CDC.
The TableMappings define which schemas and tables are included or excluded. The Rules are powerful, allowing complex filtering and transformation logic.
The TaskSettings control the behavior during the migration. TargetTablePrepMode: TRUNCATE_BEFORE_LOAD means DMS will delete all existing data in a target table before starting the full load for that table. ApplyErrorPolicies: CONTINUE means DMS will try to keep going even if it encounters errors, logging them for review. ValidationEnabled: true triggers DMS to compare data between source and target after the full load.
One aspect that trips many people up is the ExtraConnectionAttributes for PostgreSQL. For CDC to work, the source PostgreSQL database must be configured for logical replication. This typically involves setting wal_level = logical in postgresql.conf and ensuring the user specified in the source endpoint has REPLICATION privileges. The replication_slot_create_if_not_exists=true attribute tells DMS to automatically create a replication slot if one doesn’t exist, which is convenient but also means DMS is managing this critical resource.
The next challenge you’ll likely face is optimizing CDC performance for high-transaction workloads.