Exemplars are a surprisingly powerful way to attach low-overhead, high-fidelity trace context to your metrics, letting you jump from an aggregated metric to a specific transaction that caused it.
Let’s see it in action. Imagine you have a service that handles user requests, and you’re tracking the latency of these requests with a metric like http.server.request.duration. You notice this metric is showing a spike in latency, but you don’t know why. With OpenTelemetry exemplars, you can configure your SDK to automatically attach a trace ID and span ID to a subset of metric data points.
Here’s a simplified look at what a metric point with an exemplar might look like:
{
"name": "http.server.request.duration",
"value": 0.15, // 150ms
"attributes": {
"http.method": "GET",
"http.route": "/users/{id}"
},
"exemplar": {
"trace_id": "a1b2c3d4e5f67890a1b2c3d4e5f67890",
"span_id": "1234567890abcdef",
"timestamp": 1678886400000000000, // Nanoseconds
"value": 0.15 // The same value as the metric point
}
}
When you see that spike in http.server.request.duration in your metrics dashboard, you can click on the data point, and if an exemplar was captured, you’ll be presented with the trace_id and span_id. You then paste these into your tracing backend (like Jaeger, Tempo, or Honeycomb), and voilà – you’re taken directly to the specific request that was experiencing that high latency. You can see the entire trace, including the exact code path, downstream calls, and any errors that occurred during that particular transaction, all without having to sift through thousands of traces.
The core problem exemplars solve is the "noisy metric" problem. Metrics give you the what (e.g., latency is high), but not the why (e.g., which specific request caused it, and what happened during that request). Traces give you the why, but can be prohibitively expensive to collect for every single request, especially at high volumes. Exemplars bridge this gap by providing a cost-effective way to get trace context for a sample of metric observations.
Internally, the OpenTelemetry SDKs are configured to decide when to capture an exemplar. This decision is typically made probabilistically. For example, you might configure your SDK to capture an exemplar for 1% of all metric data points. When a metric is recorded, the SDK checks if it should capture an exemplar. If it decides to, it looks for the currently active trace and span context and attaches that information to the metric data point. This context is then sent along with the metric to your observability backend. The backend, if it understands OpenTelemetry exemplar format, can then use this information to link back to the trace.
The primary levers you control are:
- Sampling Rate: How often do you want to capture an exemplar? This is a trade-off between cost/overhead and the fidelity of your trace-to-metric linkage. A common starting point is 1% or 0.1%.
- Which Metrics: You can often configure which metrics are eligible to have exemplars captured. For instance, you might only want exemplars on latency or error count metrics, not on simple counter metrics.
- Backend Support: Your metrics and tracing backends must be configured to ingest and display exemplar data. Most modern observability platforms have this capability.
The most surprising part for many is how this probabilistic sampling interacts with metric aggregation. If you have a histogram metric and an exemplar is attached to a specific bucket (e.g., a request that took 150ms falls into the 100ms-200ms bucket), that exemplar is associated with that specific observation, not with the aggregated count for the bucket. This means when you query your backend for traces related to that bucket, you’ll get traces that fell into that bucket, not just any random trace.
The next concept you’ll likely explore is how to configure trace context propagation across service boundaries to ensure exemplars can link requests that span multiple microservices.