Fix Prometheus Target Down Alert: Diagnose Failures (2026)

Prometheus is failing to scrape metrics from a target, causing your "Target Down" alert to fire. This usually means Prometheus can’t establish a connection to the target service or the target service is rejecting Prometheus’s requests.

Here’s how to break down the problem and fix it:

1. Network Connectivity

Diagnosis: The most common culprit is a network issue preventing Prometheus from reaching the target.

Check: From the Prometheus server (or wherever Prometheus is running), try to curl the target’s metrics endpoint. For example, if your target is 192.168.1.100:9090 and its metrics path is /metrics, run:

curl http://192.168.1.100:9090/metrics

If this fails with "Connection refused" or "No route to host," it’s a network problem.

Fix:

Firewall Rules: Ensure your firewall (on the Prometheus server, the target server, or any network devices in between) allows traffic on the target’s port (e.g., 9090). On Linux, you might use ufw allow 9090/tcp or firewall-cmd --zone=public --add-port=9090/tcp --permanent && firewall-cmd --reload.
Network Routing: Verify that the Prometheus server can route traffic to the target’s IP address. ping <target_ip> can help here, though ICMP might be blocked. If ping works, check your network configuration (ip route show on Linux).
DNS Resolution: If you’re using hostnames in your Prometheus configuration, ensure DNS resolution is working correctly from the Prometheus server. Use dig <target_hostname> or nslookup <target_hostname>. If it fails, check your /etc/resolv.conf or your DNS server configuration.

Why it works: This step confirms that Prometheus can physically reach the target on the network. If it can’t, Prometheus has no hope of scraping metrics.

2. Target Service Not Running or Listening

Diagnosis: The application on the target machine that’s supposed to expose metrics might not be running, or it’s not listening on the expected port.

Check: On the target machine itself, check if the process is running and listening on the correct port. Use ss -tulnp | grep 9090 (replace 9090 with your target’s port). You should see a line indicating a process is listening on 0.0.0.0:9090 or <specific_ip>:9090.

Fix:

Start the Service: If the process isn’t running, start it using its service manager (e.g., systemctl start my-app.service or docker start my-container).
Configure Listening Address: If the application is running but not listening on the correct IP address or port, you’ll need to adjust its configuration. This varies greatly by application. For example, an Nginx exporter might have its listen directive misconfigured, or a custom application might need its port setting changed in an environment variable or config file.

Why it works: Prometheus can only scrape metrics from a service that is actively running and bound to a network port.

3. Prometheus Configuration Errors

Diagnosis: The Prometheus configuration (prometheus.yml) might have typos, incorrect IP addresses, ports, or scrape paths.

Check: Carefully review the scrape_configs section in your prometheus.yml. Pay close attention to the static_configs (if used) or the service discovery configuration. Ensure the targets list contains the correct IP addresses/hostnames and ports for your services.

Example prometheus.yml snippet:

scrape_configs:
  - job_name: 'my-application'
    static_configs:
      - targets: ['192.168.1.100:9090']
        labels:
          env: 'production'

Fix: Correct any typos, incorrect IP addresses, ports, or hostnames in the prometheus.yml file. After modifying, reload the Prometheus configuration by sending a SIGHUP signal or by making an HTTP POST request to the /-/reload endpoint: curl -X POST http://localhost:9090/-/reload.

Why it works: Prometheus uses this configuration file to know where and how to scrape targets. An error here means it’s looking in the wrong place or with the wrong parameters.

4. Target Metrics Path Incorrect

Diagnosis: Prometheus is connecting to the target, but the /metrics path (or whatever path is configured) is wrong or doesn’t exist on the target.

Check: Again, use curl from the Prometheus server: curl http://<target_ip>:<target_port>/metrics. If you get a 404 Not Found or a different error than a successful metrics output, the path is likely incorrect.

Fix:

Update prometheus.yml: If the metrics path is different (e.g., /probe_metrics or /metrics/v1), update the metrics_path parameter in your prometheus.yml for that job_name.
```
scrape_configs:
  - job_name: 'my-application'
    metrics_path: /my_custom_metrics
    static_configs:
      - targets: ['192.168.1.100:9090']
```
Configure Target Application: If the target application is not exposing metrics at all, you’ll need to configure it to do so. This is application-specific.

Why it works: Prometheus needs to know the exact URL endpoint on the target where metrics are served.

5. Target Service Overwhelmed or Crashing

Diagnosis: The target application might be running but is so overloaded that it cannot respond to Prometheus’s scrape requests in time, or it’s crashing repeatedly.

Check:

Target Logs: Examine the logs of the target application. Look for errors, out-of-memory (OOM) conditions, or repeated restarts.
Target Resource Usage: Check the CPU, memory, and network utilization on the target machine. High resource usage can indicate it’s struggling to keep up. Use top, htop, or cloud provider monitoring tools.
Prometheus Scrape Duration: In Prometheus’s own UI (usually http://<prometheus_ip>:9090/targets), look at the "Last Scrape Duration" for the failing target. If it’s consistently high or "unknown," the target is slow or unresponsive.

Fix:

Scale Up/Out: Increase the resources (CPU, RAM) available to the target application or scale out the number of instances if it’s a distributed system.
Optimize Application: Profile and optimize the target application to reduce its resource consumption or improve its ability to handle load.
Adjust Scrape Interval/Timeout: If the target is legitimately slow but functional, you can increase Prometheus’s scrape interval (e.g., from 15s to 30s) in prometheus.yml or increase the scrape timeout (scrape_timeout in prometheus.yml, default is 10s) to give it more time. Be cautious with these, as they can mask underlying issues.

Why it works: If the target can’t even respond to a simple HTTP request within a reasonable time, Prometheus will mark it as down. Addressing the target’s performance issues allows it to respond.

6. TLS/SSL Configuration Issues

Diagnosis: If your target is configured to use HTTPS, but Prometheus is not configured to trust the certificate or is using the wrong TLS settings.

Check:

Prometheus UI: Navigate to http://<prometheus_ip>:9090/targets. For the failing target, check the "Error" column. It might contain specific TLS-related errors like "x509: certificate signed by unknown authority" or "connection refused" if TLS handshake fails.
Curl with TLS: From the Prometheus server, try curl -v https://<target_ip>:<target_port>/metrics. The -v flag will show TLS handshake details.

Fix:

tls_config in prometheus.yml: Ensure the tls_config section for the job is correctly set up.
- If using self-signed certificates or internal CAs: Specify ca_file, cert_file, and key_file as needed. The insecure_skip_verify: true option can be used for testing but is not recommended for production.
- If the target’s certificate is valid but not trusted by the system Prometheus is running on, you may need to add the CA certificate to the system’s trust store or use ca_file in tls_config.
```
scrape_configs:
  - job_name: 'my-secure-app'
    scheme: https
    tls_config:
      ca_file: /etc/prometheus/certs/ca.crt
      cert_file: /etc/prometheus/certs/prometheus.crt
      key_file: /etc/prometheus/certs/prometheus.key
      # insecure_skip_verify: true # Use with caution!
    static_configs:
      - targets: ['secure-target.example.com:8443']
```
Target Certificate Validity: Ensure the target’s certificate is not expired and is valid for the hostname Prometheus is using to connect.

Why it works: TLS requires a successful handshake where both parties verify each other’s identity (or skip verification). Incorrect configuration prevents this handshake, leading to a connection failure.

After resolving these, your next alert will likely be about a missing metric or an alert rule evaluating incorrectly, as the system is now collecting data but not necessarily acting on it as you expect.