The rate() function in Prometheus doesn’t actually "handle" counter resets; it’s designed to mathematically compensate for them implicitly.

Let’s see this in action. Imagine a simple counter metric http_requests_total that increments with every incoming HTTP request.

http_requests_total{job="my-app", handler="/health"}

If this metric’s value is 100 at time t1 and 105 at time t2 (a few seconds later), the rate of increase is (105 - 100) / (t2 - t1). Pretty straightforward.

But what if the my-app service restarts between t1 and t2? The counter might reset to 0 and then increment to 5 by time t2. Naively calculating (5 - 100) / (t2 - t1) would give a nonsensical negative rate. This is where rate() shines.

When rate(http_requests_total[5m]) is evaluated, Prometheus looks at the time series within the 5m window. It identifies any "jumps" where the counter value decreases. These are assumed to be resets. The function then calculates the increase from the last valid point before the reset to the end of the window, plus the increase from the beginning of the window to the point after the reset. It effectively stitches together the counter’s value across the reset, giving you the true rate of increase as if the reset never happened.

The core of rate()'s magic lies in its lookback window and how it interprets the data points. It samples data points within the specified range (e.g., [5m]). If it sees a sequence like ... 100 (t1), 105 (t2), 0 (t3 - reset), 5 (t4) ..., and you query rate(http_requests_total[5m]) where t4 is within the 5m window ending now, it will:

  1. See the jump from 105 to 0 at t3 as a reset.
  2. Calculate the rate from t2 to t4 based on the value at t2 (105) and the value at t4 (5), effectively treating the reset as a wrap-around.
  3. If t1 is also within the 5m window, it will include the increase from t1 to t2 as well.

The rate() function also automatically scales the result to "per second." So, if the counter increased by 10 in 20 seconds, rate() will report 0.5 (10 requests / 20 seconds). The increase() function is similar but returns the total increase over the period, not per second, and also handles resets.

The rate() function, by default, only considers counter resets that cause a decrease in value. It assumes positive increases are always valid. The calculation is performed on the raw samples within the chosen time window. If a counter resets, Prometheus doesn’t store a negative number; it stores the new, lower value. rate() detects this drop and extrapolates.

If you’re seeing unexpected dips in your rate() graphs, it’s almost certainly due to a service restart or a metric scrape that happened to fall immediately after a counter reset. The underlying Prometheus server is robust to this; it’s the interpretation of the data that matters.

The next conceptual hurdle is understanding how avg_over_time() differs from rate() when dealing with these same counter resets.

Want structured learning?

Take the full Prometheus course →