Histograms and Summaries in Prometheus are both used for observing the distribution of events, but they solve fundamentally different problems and have drastically different performance and accuracy trade-offs.

Let’s see them in action. Imagine we’re tracking the duration of HTTP requests.

First, the Histogram. We’ll define a histogram metric:

var (
	httpRequestDuration = prometheus.NewHistogramVec(
		prometheus.HistogramOpts{
			Name:    "http_request_duration_seconds",
			Help:    "Duration of HTTP requests.",
			Buckets: []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10},
		},
		[]string{"method", "path", "status"},
	)
)

func init() {
	prometheus.MustRegister(httpRequestDuration)
}

func recordRequestDuration(method, path string, status int, duration time.Duration) {
	httpRequestDuration.WithLabelValues(method, path, strconv.Itoa(status)).Observe(duration.Seconds())
}

Now, let’s simulate some requests and record their durations:

// Simulate a GET request to /users that took 150ms
recordRequestDuration("GET", "/users", 200, 150*time.Millisecond)

// Simulate a POST request to /orders that took 750ms
recordRequestDuration("POST", "/orders", 201, 750*time.Millisecond)

// Simulate a GET request to /users that took 30ms
recordRequestDuration("GET", "/users", 200, 30*time.Millisecond)

When we query Prometheus for http_request_duration_seconds_bucket, we’ll get counts for each bucket:

http_request_duration_seconds_bucket{method="GET", path="/users", status="200", le="0.005"} 0
http_request_duration_seconds_bucket{method="GET", path="/users", status="200", le="0.01"} 0
http_request_duration_seconds_bucket{method="GET", path="/users", status="200", le="0.025"} 0
http_request_duration_seconds_bucket{method="GET", path="/users", status="200", le="0.05"} 0
http_request_duration_seconds_bucket{method="GET", path="/users", status="200", le="0.1"} 0
http_request_duration_seconds_bucket{method="GET", path="/users", status="200", le="0.25"} 1  // The 30ms request lands here
http_request_duration_seconds_bucket{method="GET", path="/users", status="200", le="0.5"} 1
http_request_duration_seconds_bucket{method="GET", path="/users", status="200", le="1"} 1
http_request_duration_seconds_bucket{method="GET", path="/users", status="200", le="2.5"} 1
http_request_duration_seconds_bucket{method="GET", path="/users", status="200", le="5"} 1
http_request_duration_seconds_bucket{method="GET", path="/users", status="200", le="10"} 1
http_request_duration_seconds_bucket{method="GET", path="/users", status="200", le="+Inf"} 1 // Total count for this label combination

http_request_duration_seconds_bucket{method="POST", path="/orders", status="201", le="0.005"} 0
// ... many more ...
http_request_duration_seconds_bucket{method="POST", path="/orders", status="201", le="0.5"} 0
http_request_duration_seconds_bucket{method="POST", path="/orders", status="201", le="1"} 1 // The 750ms request lands here
http_request_duration_seconds_bucket{method="POST", path="/orders", status="201", le="2.5"} 1
// ...
http_request_duration_seconds_bucket{method="POST", path="/orders", status="201", le="+Inf"} 1

We also get http_request_duration_seconds_count (total requests for each label set) and http_request_duration_seconds_sum (total duration for each label set). With these, we can calculate the average duration: sum(rate(http_request_duration_seconds_sum[5m])) / sum(rate(http_request_duration_seconds_count[5m])). Crucially, we can calculate approximate quantiles (like 95th percentile) using the histogram_quantile function: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, method, path, status)).

Now, let’s look at the Summary:

var (
	httpResponseSize = prometheus.NewSummaryVec(
		prometheus.SummaryOpts{
			Name:       "http_response_size_bytes",
			Help:       "Size of HTTP responses in bytes.",
			Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},
		},
		[]string{"method", "path", "status"},
	)
)

func init() {
	prometheus.MustRegister(httpResponseSize)
}

func recordResponseSize(method, path string, status int, size int64) {
	httpResponseSize.WithLabelValues(method, path, strconv.Itoa(status)).Observe(float64(size))
}

Simulating response sizes:

// Simulate a GET request to /data with a 5KB response
recordResponseSize("GET", "/data", 200, 5*1024)

// Simulate a GET request to /data with a 1MB response
recordResponseSize("GET", "/data", 200, 1024*1024)

// Simulate a GET request to /data with a 10KB response
recordResponseSize("GET", "/data", 200, 10*1024)

When queried, http_response_size_bytes will give us the exact quantiles we defined in Objectives:

http_response_size_bytes{method="GET", path="/data", status="200", quantile="0.5"} 7680
http_response_size_bytes{method="GET", path="/data", status="200", quantile="0.9"} 10240
http_response_size_bytes{method="GET", path="/data", status="200", quantile="0.99"} 1048576

It also provides _count and _sum like the Histogram, allowing average calculation.

The core difference lies in how they calculate quantiles. Histograms use quantization by defining fixed buckets. Prometheus then aggregates these bucket counts across time and instances, and histogram_quantile estimates the quantile based on the distribution of counts within those buckets. This is highly scalable and works well across distributed systems, but the accuracy of the quantile depends on the quality of your bucket definitions and the number of observations. If all your observations fall into a single bucket, you can’t accurately determine quantiles within that bucket.

Summaries, on the other hand, calculate quantiles locally on the client (where your application code is running). Each instance of your application calculates the specified quantiles (e.g., 0.5, 0.9, 0.99) based on all the observations it has seen since it started or since the last reset. Prometheus then aggregates these pre-calculated quantiles. This gives exact quantiles for each instance, but it’s not scalable for aggregation across many instances. The _count and _sum metrics for Summaries are only accurate if you are not aggregating them across instances, as each instance would report its own count and sum, leading to double-counting if naively summed. This is why you generally don’t use the _count and _sum from Summaries for global averages.

Choose a Histogram when you need to:

  • Aggregate observations across many instances (e.g., a global average latency or 95th percentile latency for all your web servers).
  • Understand the distribution of events across predefined ranges.
  • You are okay with approximate quantiles and can tune your buckets.
  • You need to calculate quantiles using histogram_quantile in PromQL.

Choose a Summary when you need to:

  • Calculate exact quantiles for a single instance of your application.
  • You want precise quantiles for specific, predefined levels (e.g., exact median, exact 99th percentile) without worrying about bucket tuning.
  • You are not aggregating counts/sums across multiple instances of the same metric.

The most common mistake is using Summary for metrics that span multiple instances, expecting global quantiles. The _count and _sum are client-side aggregates, and when Prometheus scrapes multiple instances, it will sum these up, leading to inflated values. For example, if you have 10 instances each reporting 100 requests, the _count will become 1000 in Prometheus, not 100. This is why histogram_quantile is a PromQL function, operating on the globally aggregated bucket counts, while Summary quantiles are calculated on the client and then scraped.

The next thing you’ll likely encounter is how to choose appropriate buckets for your Histograms.

Want structured learning?

Take the full Prometheus course →