The most surprising thing about Route 53 Application Recovery Controller (ARC) is that it doesn’t actually do the recovery itself; it’s the orchestrator that enables your recovery mechanisms to work reliably across multiple AWS Regions.
Let’s see it in action. Imagine you have a critical application deployed in both us-east-1 and us-west-2. Your goal is to have a warm standby in us-west-2 that can take over if us-east-1 becomes unavailable.
First, you need to set up your ARC components.
1. Create a Recovery Group: A recovery group is a logical grouping of AWS resources that are managed together for disaster recovery. It’s the foundational element for ARC.
aws route53-recovery-control create-recovery-group \
--client-token abcdef123456 \
--recovery-group-name my-app-recovery-group
The client-token is a unique identifier for the request, ensuring idempotency. The recovery-group-name is how you’ll refer to this group.
2. Create an Assertion Rule: Assertion rules define the state of your recovery group. They assert that a certain number of control panel states must be ON for the recovery group to be considered healthy and available for failover.
aws route53-recovery-control create-assertion-rule \
--client-token fedcba654321 \
--recovery-group-arn arn:aws:route53-recovery-control::111122223333:recovery-group/abcdef01-2345-6789-abcd-ef0123456789 \
--rule-config '{
"Inverted": false,
"Threshold": 2,
"WaitPeriodMs": 5000,
"Name": "my-app-assertion-rule"
}'
Here, recovery-group-arn points to the group created in step 1. The Threshold: 2 means that at least two control panel states must be ON for this rule to pass. WaitPeriodMs: 5000 introduces a 5-second grace period to allow for eventual consistency. Inverted: false means the rule passes when the threshold is met or exceeded.
3. Create a Routing Control: Routing controls are the actual switches that you flip to signal a failover. Each routing control is associated with a control panel.
aws route53-recovery-control create-routing-control \
--client-token 12345abcde \
--routing-control-name us-east-1-primary \
--associated-cluster arn:aws:route53-recovery-control::111122223333:cluster/fedcba98-7654-3210-fedc-ba9876543210
You’ll create one routing control per region you want to manage. The associated-cluster ARN is crucial; it links the routing control to the underlying ARC control plane.
4. Associate Routing Controls with the Recovery Group: Now, link your routing controls to the recovery group.
aws route53-recovery-control update-recovery-group \
--recovery-group-arn arn:aws:route53-recovery-control::111122223333:recovery-group/abcdef01-2345-6789-abcd-ef0123456789 \
--routing-control-arn arn:aws:route53-recovery-control::111122223333:routing-control/a1b2c3d4-e5f6-7890-1234-567890abcdef \
--client-token 98765edcba
You’ll repeat this for each routing control you want in the group.
5. Update the Assertion Rule to Include Routing Controls: Modify your assertion rule to reference the routing controls.
aws route53-recovery-control update-assertion-rule \
--assertion-rule-arn arn:aws:route53-recovery-control::111122223333:assertion-rule/abcdef01-2345-6789-abcd-ef0123456789 \
--rule-config '{
"Inverted": false,
"Threshold": 2,
"WaitPeriodMs": 5000,
"Name": "my-app-assertion-rule"
}' \
--routing-control-ids "a1b2c3d4-e5f6-7890-1234-567890abcdef", "b2c3d4e5-f6a7-8901-2345-67890abcdef1" \
--client-token 1122334455
The routing-control-ids parameter now lists the ARNs of the routing controls that contribute to this assertion rule.
6. Configure Health Checks and DNS: ARC itself doesn’t monitor your application health. You’ll use AWS Health Checks (Route 53 Health Checks) to monitor the health of your application endpoints in each region. Then, you’ll configure Route 53 DNS records (e.g., Weighted, Latency, or Failover routing policies) to point to your application endpoints. Crucially, you’ll configure these DNS records to evaluate the state of your ARC routing controls.
For example, a Route 53 Failover record might look like this:
{
"ChangeBatch": {
"Changes": [
{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "myapp.example.com",
"Type": "A",
"SetIdentifier": "us-east-1-primary",
"Failover": "PRIMARY",
"TTL": 60,
"ResourceRecords": [
{
"Value": "192.0.2.1"
}
],
"HealthCheckId": "abcdef1234567890abcdef1234567890"
}
},
{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "myapp.example.com",
"Type": "A",
"SetIdentifier": "us-west-2-secondary",
"Failover": "SECONDARY",
"TTL": 60,
"ResourceRecords": [
{
"Value": "198.51.100.1"
}
],
"HealthCheckId": "fedcba0987654321fedcba0987654321"
}
}
]
}
}
The magic happens when Route 53, in conjunction with ARC, checks the routing control state. If the us-east-1-primary routing control is turned OFF (signaling a failover), and the us-west-2-secondary routing control is ON, Route 53 will direct traffic to us-west-2.
The actual mechanism ARC uses to communicate its state to Route 53 is through a distributed control plane that Route 53 queries. When you update a routing control, ARC propagates that state change across its control panels. Route 53’s health checks are configured to monitor not just the application endpoint but also the state of the associated routing control. This dual check ensures that traffic only fails over if the application is unhealthy and the desired failover state is signaled by ARC.
When you use the aws route53-recovery-control update-routing-control command to flip a routing control, you’re not just changing a value in a database; you’re sending a command to a highly available, distributed system. This system then propagates that state change to all its control panel endpoints, which Route 53 queries. The Threshold in your assertion rule ensures that a minimum number of these control panels agree on the state before it’s considered authoritative.
The next challenge is integrating ARC with your CI/CD pipelines to automate failover initiation.