The Prometheus Alertmanager is failing to evaluate alert rules because the Prometheus server is unable to scrape the Alertmanager’s metrics endpoint.

Here are the most common reasons this happens and how to fix them:

1. Network Connectivity Issues

The Prometheus server needs to be able to reach the Alertmanager over the network. This is the most frequent culprit.

Diagnosis: From the Prometheus server’s shell, try curl <alertmanager_ip>:<alertmanager_port>/metrics. If it fails with a "Connection refused" or "No route to host" error, it’s a network problem.

Fix: Ensure that firewalls (both on the Prometheus server, Alertmanager, and any intermediate network devices) allow traffic from the Prometheus server’s IP address to the Alertmanager’s IP and metrics port (usually 9093).

Why it works: This removes any network-level blocks preventing Prometheus from establishing a connection to Alertmanager.

2. Incorrect alertmanagers Configuration in Prometheus

Prometheus needs to know where to find the Alertmanager. If this configuration is wrong, it simply won’t know where to send its alerts.

Diagnosis: Check your prometheus.yml configuration file. Look for the alerting section and verify the alertmanagers configuration.

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - '192.168.1.100:9093' # Make sure this IP and port are correct

Fix: Update the targets list in prometheus.yml to accurately reflect the IP address and port where your Alertmanager is running. After changing, reload Prometheus’s configuration by sending a SIGHUP signal to its process or by making a POST request to /-/reload on the Prometheus HTTP API.

Why it works: This tells Prometheus the correct destination to send its alert rule evaluations and discovered alerts.

3. Alertmanager Not Running or Crashed

The Alertmanager process itself might be down or have crashed. Prometheus can’t scrape metrics from a service that isn’t running.

Diagnosis: On the machine where Alertmanager is supposed to be running, check its status. If you’re using systemd, run systemctl status alertmanager. Look for "active (running)". If it’s not running, check the logs for crash reasons (journalctl -u alertmanager -f).

Fix: If Alertmanager is not running, start it using systemctl start alertmanager or by executing the Alertmanager binary directly. If it crashed, investigate the logs to understand why (e.g., configuration errors, out of memory).

Why it works: This ensures the Alertmanager service is operational and listening for scrape requests from Prometheus.

4. Incorrect scrape_configs in Prometheus for Alertmanager

Prometheus scrapes everything it monitors. It needs a specific scrape configuration job for Alertmanager itself to fetch its health and rule evaluation status.

Diagnosis: In prometheus.yml, ensure you have a scrape_configs entry for Alertmanager. It should look something like this:

scrape_configs:
  - job_name: 'alertmanager'
    static_configs:
      - targets: ['192.168.1.100:9093'] # Again, verify IP and port
    metrics_path: '/metrics' # Default, but good to confirm

Fix: Add or correct the job_name: 'alertmanager' section in your prometheus.yml to point to the correct Alertmanager address. Reload Prometheus configuration.

Why it works: This explicitly tells Prometheus to scrape the /metrics endpoint of the Alertmanager, allowing it to determine Alertmanager’s health and status for rule evaluation.

5. Alertmanager Configuration Errors (Internal)

Even if Prometheus can reach Alertmanager, Alertmanager might be misconfigured internally, preventing it from processing rules correctly.

Diagnosis: Check the Alertmanager logs for errors related to rule loading or evaluation. Look for messages indicating syntax errors in alert.rules.yml (or whichever file your rules are in) or problems connecting to its data storage if applicable.

Fix: Review your alertmanager.yml and any referenced rule files for syntax errors. Ensure rule files are correctly specified in the Alertmanager configuration. For example, if you’re using Prometheus to push rules to Alertmanager, ensure that configuration is correct and Prometheus has the necessary permissions. If Alertmanager is configured to load rules directly, ensure the rule_file directive in alertmanager.yml is correct and the files exist. Reload Alertmanager configuration.

Why it works: This resolves internal issues within Alertmanager that prevent it from correctly parsing and evaluating the alert rules it receives from Prometheus.

6. DNS Resolution Failure

If you’re using hostnames instead of IP addresses for Alertmanager in your Prometheus configuration, DNS resolution might be failing.

Diagnosis: From the Prometheus server’s shell, try nslookup <alertmanager_hostname>. If it fails to resolve, that’s your problem.

Fix: Ensure the DNS server configured on the Prometheus host is reachable and can resolve the Alertmanager’s hostname. Alternatively, switch to using the Alertmanager’s IP address directly in the Prometheus configuration.

Why it works: This ensures Prometheus can translate the hostname into an IP address, allowing it to establish a network connection.

Once these are resolved, the next error you’ll likely encounter is related to the actual alert rules themselves failing to fire due to incorrect PromQL expressions.

Want structured learning?

Take the full Prometheus course →