Alertmanager’s routing isn’t just a simple switchboard; it’s a sophisticated, stateful engine that processes alerts based on their labels, deciding not only where they go but also how they’re grouped and silenced.

Let’s watch Alertmanager in action with a sample alert. Imagine we have Prometheus scraping a service, and a critical metric http_requests_total goes south. Prometheus fires an alert named HighRequestLatency.

# prometheus.yml
scrape_configs:
  - job_name: 'my-app'
    static_configs:
      - targets: ['localhost:9090']

rule_files:
  - 'alert_rules.yml'
# alert_rules.yml
groups:
- name: http_alerts
  rules:
  - alert: HighRequestLatency
    expr: rate(http_requests_total{job="my-app", status=~"5.."}[5m]) > 0.1
    for: 5m
    labels:
      severity: critical
      team: backend
    annotations:

      summary: "High request latency detected on {{ $labels.instance }}"


      description: "The {{ $labels.job }} job on {{ $labels.instance }} is experiencing high latency (5xx errors)."

When Prometheus detects this condition for 5 minutes, it sends the HighRequestLatency alert to Alertmanager. Alertmanager receives this alert and immediately consults its routing tree.

Here’s a simplified Alertmanager configuration:

# alertmanager.yml
route:
  group_by: ['alertname', 'job']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'default-receiver'

  routes:
  - match:
      severity: critical
    receiver: 'critical-alerts'
    continue: true

  - match_re:
      team: '(backend|frontend)'
    receiver: 'dev-team-notifications'

receivers:
- name: 'default-receiver'
  webhook_configs:
  - url: 'http://localhost:5001/' # Default fallback

- name: 'critical-alerts'
  slack_configs:
  - channel: '#critical-alerts'
    api_url: 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX'

- name: 'dev-team-notifications'
  email_configs:
  - to: 'dev-team@example.com'
    smarthost: 'smtp.example.com:587'
    auth_username: 'alertmanager@example.com'
    auth_password: 'your_smtp_password'

In this configuration:

  • The top-level route defines default behavior: group alerts by alertname and job, wait 30 seconds (group_wait) before sending the first notification for a new group, and wait 5 minutes (group_interval) before sending a notification about new alerts within an existing, already notified group. Alerts will repeat every 4 hours (repeat_interval) if they remain active.
  • The routes section is where the magic happens. Alertmanager evaluates these sequentially.
    • The first route matches alerts with severity: critical. If an alert matches this, it’s sent to the critical-alerts receiver (which sends to Slack channel #critical-alerts). The continue: true means that even though it matched this route, Alertmanager continues to evaluate subsequent routes.
    • The second route match_re (regex match) looks for alerts where the team label is either backend or frontend. If the alert matches this (and it will, because our HighRequestLatency alert has team: backend), it’s sent to the dev-team-notifications receiver (emailing dev-team@example.com).

Because our HighRequestLatency alert has severity: critical and team: backend, it will be routed to both the critical-alerts Slack channel and the dev-team@example.com email address due to continue: true. If continue were false (the default), it would only go to the first matching receiver.

The group_by directive is crucial. Alertmanager collects alerts that share the same set of labels specified in group_by. For our HighRequestLatency alert, it will be grouped with other alerts having the same alertname and job. This prevents a flood of individual notifications for a single incident affecting multiple instances of the same service. The group_wait ensures that if multiple instances of the same alert fire within that wait period, they are bundled into a single notification.

The one thing most people miss is how continue: true interacts with group_by and group_wait. If an alert matches multiple routes with continue: true, it will be sent to all those receivers. However, Alertmanager still applies group_by and group_wait per receiver. So, if the same alert is routed to two different receivers, it might trigger notifications to both at slightly different times, or be batched differently if other alerts also arrive for those specific routes. This can lead to complex notification patterns if not carefully managed.

The next thing you’ll run into is managing notification inhibition, where one alert can suppress another.

Want structured learning?

Take the full Prometheus course →