Prometheus Metric Types: Counter, Gauge, Histogram (2026)

Prometheus’s Counter metric type is fundamentally a time-series of monotonically increasing numbers, but that’s not the most surprising thing about it. The real magic is how it can help you detect and diagnose problems even when the counter resets to zero.

Let’s see a Counter in action. Imagine you’re tracking the number of HTTP requests served by a web server. Here’s what a snippet of Prometheus exposition format might look like:

# HELP http_requests_total Total number of HTTP requests received.
# TYPE http_requests_total counter
http_requests_total{method="POST",code="200"} 1523
http_requests_total{method="POST",code="404"} 12
http_requests_total{method="GET",code="200"} 5678

The http_requests_total is a Counter. Each time a request comes in, the relevant labeled counter increases by one. If the web server restarts, this counter will reset to zero. This is where Prometheus’s rate() and irate() functions become indispensable. rate(http_requests_total[5m]) calculates the per-second average rate of increase over the last 5 minutes. irate(http_requests_total[5m]) does the same but uses only the last two data points within the 5-minute window, making it more sensitive to sudden spikes.

The mental model for Counter is simple: it’s an odometer for your application’s events. Each increment represents a discrete occurrence. What makes it powerful is how Prometheus handles the potential for resets. When rate() or irate() encounter a counter that has decreased (indicating a reset), they automatically adjust by assuming the counter wrapped around or the process restarted. They calculate the rate based on the increase since the last scrape and add the difference from the scrape before that to account for the reset. This allows you to get a consistent, accurate rate of events even across application restarts.

The Gauge metric type, on the other hand, represents a value that can go up and down. Think of memory usage, CPU load, or the number of active connections.

# HELP process_resident_memory_bytes Resident memory in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes{job="my_app"} 1.3e8

Here, process_resident_memory_bytes can increase as your application consumes more memory, and decrease as it frees memory. You can also directly set a gauge to a specific value using Set(), increment it with Inc(), decrement it with Dec(), or add/subtract arbitrary values with Add(value). Unlike counters, there’s no built-in function to magically handle "resets" for gauges because their value is inherently mutable.

Histogram metrics are used for observing the distribution of values. They work by dividing the observed values into configurable "buckets" and also providing a count of all observations and the sum of all observed values.

# HELP http_request_duration_seconds Histogram of latencies for HTTP requests.
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.1",method="GET",code="200"} 123
http_request_duration_seconds_bucket{le="0.5",method="GET",code="200"} 456
http_request_duration_seconds_bucket{le="+Inf",method="GET",code="200"} 500
http_request_duration_seconds_sum{method="GET",code="200"} 123.45
http_request_duration_seconds_count{method="GET",code="200"} 500

The _bucket suffixes indicate the count of observations that fell into a specific bucket, defined by the le (less than or equal to) label. +Inf represents all observations. _sum is the total sum of all observed values, and _count is the total number of observations (which should equal the count in the +Inf bucket).

The most surprising thing about histograms, and how they’re typically used, is that the _bucket counts are themselves Counters. This means if your application restarts, the bucket counts will reset. However, when you query a histogram with histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])), Prometheus’s rate() function correctly handles the underlying counters resetting. It calculates the rate of observations falling into each bucket and the rate of the sum and count. This allows you to reliably calculate approximate quantiles (like the 95th percentile latency) even across application restarts.

The actual levers you control are the bucket boundaries defined in your application’s Prometheus client library. Choosing these boundaries is critical. If you set buckets too wide, you lose granularity. If they’re too narrow, you can generate a lot of time series. For example, a common pattern for request durations is to have exponentially increasing buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 30, 60, 120, 300, 600, 1200, 3000, 10000].

The next concept you’ll likely explore is how to use these metric types to build effective alerts and dashboards.