rate() and irate() are both Prometheus functions for calculating the per-second average rate of increase of a counter metric, but they handle data points differently, leading to distinct use cases.

Let’s see them in action. Imagine we have a counter metric http_requests_total that increments with every HTTP request received by a service.

First, we need some sample data. In a real Prometheus setup, this data would be coming from your exporters. For demonstration, let’s simulate it.

# Simulate a counter increasing over time
# In a real scenario, this data would be scraped from targets.
# For example: http_requests_total{method="GET", handler="/api/v1/users"}

# Let's assume we have the following raw data points for a specific time series:
# Time: 10:00:00, Value: 100
# Time: 10:00:01, Value: 101
# Time: 10:00:02, Value: 103
# Time: 10:00:03, Value: 104
# Time: 10:00:04, Value: 106
# Time: 10:00:05, Value: 107
# Time: 10:00:06, Value: 108
# Time: 10:00:07, Value: 110
# Time: 10:00:08, Value: 112
# Time: 10:00:09, Value: 113
# Time: 10:00:10, Value: 115

Now, let’s query these with rate() and irate(). Both functions take a range vector as input. The range vector specifies the time window over which to calculate the rate. The default is 5 minutes if not specified. Let’s use a 1-minute window for this example.

Using rate():

rate() calculates the average rate of increase over the specified time range. It linearly extrapolates to the first and last data points in the time range to account for potential missing data points or scrape failures.

rate(http_requests_total[1m])

Let’s manually calculate this for our sample data over the last minute (from 10:00:01 to 10:00:10, assuming the query is run at 10:00:10).

  • First data point in range: Time 10:00:01, Value 101
  • Last data point in range: Time 10:00:10, Value 115
  • Duration: 10 seconds (from 10:00:01 to 10:00:10)
  • Increase: 115 - 101 = 14
  • Average rate: 14 / 10 seconds = 1.4 requests/second.

rate() would smooth out any intermediate spikes or dips. If a data point was missing at 10:00:05, rate() would still estimate the rate based on the available points, potentially interpolating.

Using irate():

irate() calculates the instantaneous rate of increase based on the last two data points within the specified time range. It does not extrapolate. This makes it more sensitive to sudden changes and less forgiving of missing data.

irate(http_requests_total[1m])

Let’s calculate this for our sample data at 10:00:10, looking at the last two points within the 1-minute window.

  • Second-to-last data point in range: Time 10:00:09, Value 113
  • Last data point in range: Time 10:00:10, Value 115
  • Duration: 1 second (from 10:00:09 to 10:00:10)
  • Increase: 115 - 113 = 2
  • Instantaneous rate: 2 / 1 second = 2 requests/second.

You can see how irate() gives a higher value here because the last second saw a larger increase (2 requests) compared to the overall average over the minute (1.4 requests).

When to Use Each:

  • Use rate() for:

    • Graphing and alerting on overall trends: When you want to see the average load or throughput over a period of time, rate() is your go-to. It provides a smoother, more stable view, making it ideal for dashboards and alerts where you’re interested in sustained levels rather than brief fluctuations.
    • Calculating rates over longer time windows: For periods of 5 minutes or more, rate() is generally more appropriate as it smooths out the noise.
    • When you expect occasional scrape failures: rate()'s extrapolation helps to provide a more continuous rate even if a scrape is missed.

    Example: sum(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) to see the average CPU idle time per instance over the last 5 minutes.

  • Use irate() for:

    • Detecting sudden spikes or drops: irate() is excellent for identifying rapid changes in a metric. If you want to know right now how fast something is happening, and you care about the most recent behavior, irate() is the choice.
    • Alerting on short-lived anomalies: For alerts that should fire immediately on a sharp change, irate() is often preferred.
    • When you need to be very sensitive to the latest data: If you have very frequent scrapes and want to react to the very latest change, irate() is more responsive.

    Example: sum(irate(http_requests_total{code="500"}[1m])) by (job) to alert on a sudden surge of 500 errors in the last minute.

The Crucial Difference: Data Points and Extrapolation

The core of the difference lies in how they handle the data points within the specified range vector. rate() considers all data points within the range and performs linear interpolation between the first and last points to estimate the rate. This means if your scrape interval is 15 seconds and you query rate(metric[1m]), Prometheus will look at approximately 4 data points (since 1 minute / 15 seconds = 4). It will then calculate the rate based on the value at the start of the minute and the value at the end of the minute, assuming a steady increase between them. This smoothing is what makes it good for trends.

irate(), on the other hand, only looks at the last two data points within the range. If you query irate(metric[1m]), and your scrape interval is 15 seconds, it will only consider the two most recent data points that fall within that last minute. If the last two data points are only 15 seconds apart, the calculation will be based on that 15-second interval, not the full minute. This makes it highly reactive to the latest changes but susceptible to noise if data points are unevenly spaced or if there are brief gaps. If a scrape is missed, irate() might produce a very high or a zero rate depending on which two points it ends up using.

A common mistake is using irate() with a very large range, expecting it to be smoothed like rate(). For instance, irate(metric[1h]) will still only use the last two points within that hour, and if those points are, say, 5 minutes apart, the rate will be calculated over that 5-minute interval, potentially giving a misleadingly low instantaneous rate for the current moment. Conversely, using rate() with a very small range like rate(metric[15s]) will give you a much smoother, averaged rate over that short period, potentially masking a very recent spike that irate() would catch.

The next concept you’ll likely encounter is using these functions within aggregations like sum() and avg(), and understanding how the range vector duration interacts with the scrape interval and evaluation interval.

Want structured learning?

Take the full Prometheus course →