Route 53 failover routing is designed to automatically switch traffic from a primary resource to a secondary resource when the primary becomes unavailable, ensuring high availability for your applications.

Let’s see it in action with a simplified example. Imagine you have a web application running in two different AWS regions, us-east-1 and us-west-2. We want traffic to go to us-east-1 by default, and if that region becomes unhealthy, Route 53 should automatically direct users to us-west-2.

Here’s how you’d set up the Route 53 records:

Primary Record (us-east-1):

  • Type: A
  • Name: app.example.com
  • Alias: Yes
  • Alias Target: Your Application Load Balancer (ALB) in us-east-1 (e.g., dualstack.alb-1234567890.us-east-1.elb.amazonaws.com)
  • Routing Policy: Failover
  • Failover Record Type: Primary
  • Health Check ID: assoc-abcdef1234567890 (This ID points to a health check that monitors the ALB in us-east-1)

Secondary Record (us-west-2):

  • Type: A
  • Name: app.example.com
  • Alias: Yes
  • Alias Target: Your Application Load Balancer (ALB) in us-west-2 (e.g., dualstack.alb-9876543210.us-west-2.elb.amazonaws.com)
  • Routing Policy: Failover
  • Failover Record Type: Secondary
  • Health Check ID: assoc-fedcba0987654321 (This ID points to a health check that monitors the ALB in us-west-2)

Health Check Configuration:

For the primary record, you’d create a health check that specifically probes the health of your primary resource. This could be an HTTP/HTTPS check to a specific path on your ALB (e.g., GET /health) with a success code of 200. The health check for the secondary record would do the same for the secondary resource.

When a user requests app.example.com, Route 53 checks the health of the primary record. If the associated health check is passing, Route 53 returns the IP address(es) for the primary resource. If the health check for the primary record is failing, Route 53 checks the health of the secondary record. If the secondary is healthy, Route 53 returns the IP address(es) for the secondary resource.

This setup solves the problem of maintaining application availability during regional outages or failures of individual primary resources. It’s not just about having a backup; it’s about an automated, DNS-level shift of traffic.

The core mechanism at play is Route 53’s ability to associate health checks with DNS records. When a health check fails, Route 53 effectively "removes" that record from DNS resolution for the failing endpoint. Crucially, the failover routing policy doesn’t just rely on the health check status at the time of the query. Route 53 health checks are continuously evaluated. If a primary health check has been failing for a configured duration (default is 30 seconds with 10-second intervals), and then it starts passing again, Route 53 will automatically switch traffic back to the primary record without any manual intervention. This is known as automatic failback.

The most surprising aspect for many is the interaction between health checks and the DNS TTL (Time To Live). While the health check determines if a record is served, the TTL dictates how long DNS resolvers cache the IP address. If your primary record has a TTL of 60 seconds and its health check fails, users whose resolvers haven’t yet refreshed their DNS cache will continue to receive the old, now-unhealthy IP address for up to that 60-second TTL. This is why it’s common to use shorter TTLs (e.g., 60 seconds) for failover records to ensure a quicker propagation of the DNS change across the internet.

The next concept to explore is how to handle more complex scenarios, such as multi-region active-active setups or weighted routing for A/B testing alongside failover.

Want structured learning?

Take the full Route53 course →