Route 53 health checks are your eyes and ears for the availability of your application endpoints, letting you know before your users do when something breaks.
Let’s watch a health check in action. Imagine we have a simple web server running at example.com. We can configure Route 53 to ping this server:
{
"CallerReference": "my-health-check-creation-12345",
"HealthCheckConfig": {
"IPAddress": "93.184.216.34",
"Port": 80,
"Type": "HTTP",
"RequestInterval": 30,
"FailureThreshold": 3,
"RequestIntervalTimeUnit": "SECONDS",
"HealthThreshold": 3
},
"HealthCheckTags": [
{
"Key": "Name",
"Value": "example-com-webserver"
}
]
}
When you create this, Route 53 spins up health check probes in multiple AWS regions. These probes will periodically send an HTTP GET request to 93.184.216.34 on port 80. If the server responds with a 2xx or 3xx status code, the probe considers it healthy. If it fails to get a response, or gets a 4xx or 5xx status code, it marks it as unhealthy.
The RequestInterval (30 seconds) is how often each probe checks. The FailureThreshold (3) is how many consecutive failed checks an individual probe must see before it declares the endpoint unhealthy. The HealthThreshold (3) is the number of probes that must report healthy for the overall health check to be considered healthy.
This system solves the problem of reactive monitoring. Instead of waiting for a user to report an outage, Route 53 health checks proactively detect failures. When a health check fails, Route 53 can automatically reroute traffic away from the unhealthy endpoint to a healthy one if you’ve configured a failover routing policy. It can also trigger alerts via CloudWatch.
Here’s how the internal mechanics work. Route 53 maintains a fleet of health checkers distributed across various AWS edge locations. Each health checker independently performs the configured checks. These checkers report their status back to a central Route 53 service. The service aggregates these results. For a health check to be considered healthy, a supermajority of the probes (determined by HealthThreshold) must report success. Conversely, a single probe failing consistently enough (FailureThreshold) can begin to shift the aggregate state towards unhealthy.
You can also health check specific paths on your web server. Instead of just hitting the root /, you could specify Path: "/health". This is crucial because your server might be running, but its application logic could be hung. A dedicated health endpoint can expose the actual application status, not just that the web server process is alive.
When a health check transitions from healthy to unhealthy, Route 53 publishes a Health Status metric to CloudWatch. You can create alarms on this metric. For instance, to be notified if the example-com-webserver health check becomes unhealthy, you would set up a CloudWatch alarm that triggers when the HealthStatus metric for that specific health check ID goes below 1 (where 1 is healthy, 0 is unhealthy) for 5 minutes. This alarm could then send a notification to an SNS topic, which in turn could email your operations team.
The most surprising thing about Route 53 health checks is their default behavior when you check an IP address directly without a DNS name: they perform a TCP connection test to the specified IPAddress and Port. If the TCP handshake succeeds, it’s considered healthy. This means you can health check any TCP-based service, not just HTTP, without needing a specific application-level protocol. You’re not limited to just web servers; you can monitor databases, custom TCP services, or even just an open port.
Once your health checks are configured and alerting, the next logical step is to integrate them into your DNS routing policies, such as creating a weighted or failover record set that automatically selects healthy endpoints.