Prometheus is failing because its time-series database, TSDB, can’t accept new data points that are too far in the future relative to the current time.

The most common culprit is a clock skew between the Prometheus server and the targets it’s scraping.

Cause 1: Clock Skew

  • Diagnosis: On the Prometheus server, run date. On a target server, run date. Compare the outputs. Significant differences (more than a few seconds) indicate clock skew.
  • Fix: Configure NTP (Network Time Protocol) on all Prometheus servers and targets. On Debian/Ubuntu, install ntp or chrony: sudo apt update && sudo apt install ntp. On RHEL/CentOS, install chrony: sudo yum install chrony and enable it: sudo systemctl enable --now chronyd. Ensure your NTP server is reliable and accessible.
  • Why it works: NTP synchronizes clocks across machines, ensuring Prometheus and its targets agree on the current time, preventing data from being considered "out of bounds."

Cause 2: Stale Prometheus Binaries/Configuration

  • Diagnosis: If you recently upgraded Prometheus or changed configurations, an old binary might still be running or a stale configuration file is being loaded. Check the Prometheus process’s executable path and configuration file path.
  • Fix: Ensure you’ve stopped the old Prometheus process (sudo systemctl stop prometheus or pkill prometheus) and started the new one using the correct binary and configuration file. Verify the running process: ps aux | grep prometheus.
  • Why it works: Running an outdated binary or with an old configuration means Prometheus might not be aware of recent changes or might be operating with incorrect assumptions about time.

Cause 3: Network Latency and Target Unresponsiveness

  • Diagnosis: Long network latency between Prometheus and its targets, or targets that are slow to respond, can cause scrape requests to time out. Prometheus then might record the scrape time as the "current" time, leading to an out-of-bounds error if the target later reports data with a slightly older timestamp. Check Prometheus’s scrape logs for scrape_duration_seconds and scrape_samples_scraped metrics for targets experiencing high latency. You can also use ping and traceroute to diagnose network issues.
  • Fix: Optimize network connectivity between Prometheus and its targets. If targets are consistently slow, investigate their performance. Consider increasing Prometheus’s scrape timeout (--web.external-url and --storage.tsdb.retention.time might also be relevant if storage is filling up due to slow writes, though not directly for out-of-bounds): In prometheus.yml, for specific jobs, you can add scrape_timeout: 30s.
  • Why it works: Reducing scrape duration ensures Prometheus receives data closer to when it was generated, minimizing the window for clock drift to cause issues.

Cause 4: System Time Jumps Backwards

  • Diagnosis: While less common than general clock skew, a system’s clock can sometimes jump backward due to NTP misconfiguration, leap second handling issues, or manual intervention. Check system logs (/var/log/syslog or journalctl) for any messages indicating significant time changes.
  • Fix: Ensure NTP is correctly configured and that your system is set to use leap-seconds.conf if applicable, or that your NTP client handles leap seconds gracefully. For chrony, this typically involves adding leapsecmode max to /etc/chrony.conf.
  • Why it works: Preventing the system clock from jumping backward ensures that new data points are always perceived as arriving after older ones.

Cause 5: Misconfigured timestamp_offset on Targets

  • Diagnosis: Some exporters or applications might have a timestamp_offset configuration option that incorrectly offsets the timestamps they send. This is rare but can happen with custom collectors or specific application metrics. Inspect the configuration of your monitored targets.
  • Fix: If such an offset is found, remove or correct it in the target’s configuration.
  • Why it works: Ensuring targets send accurate, unadulterated timestamps directly addresses the root cause of data being reported with incorrect time values.

Cause 6: Prometheus Server Resource Starvation

  • Diagnosis: If the Prometheus server is experiencing high CPU load or I/O wait, its internal clock might not be accurately ticking or its ability to process incoming data might be delayed, leading to perceived clock skew. Monitor the Prometheus server’s CPU, memory, and I/O usage.
  • Fix: Scale up the Prometheus server’s resources (CPU, RAM) or optimize its workload. This might involve reducing the number of targets, lowering scrape intervals, or sharding Prometheus.
  • Why it works: A healthy, responsive Prometheus server can accurately track time and process incoming samples without delays that mimic clock skew.

The next error you’ll likely encounter after resolving the "out of bounds" issue, especially if there were significant clock skews or network delays, is too many open files, as Prometheus might have accumulated a large number of open file descriptors due to repeated failed scrape attempts.

Want structured learning?

Take the full Prometheus course →