Head sampling is often misunderstood as a simple way to reduce telemetry volume, but its true power lies in its ability to influence the shape of your observability data.
Let’s see it in action. Imagine we’re tracing requests to a simple microservice architecture: a frontend calling a users service and a products service. We’re using OpenTelemetry with a jaeger exporter.
Here’s a snippet of our OpenTelemetry configuration, focusing on the sampling aspect:
receivers:
otlp:
protocols:
grpc:
http:
processors:
# Head sampling configuration
sample:
# This is where the magic happens. 'rate' controls the percentage of traces to keep.
# 'trace_id_ratio' is the most common strategy.
trace_id_ratio:
# Keep 10% of all traces. The sampling decision is made on the first span of a trace.
# This is fast but might miss rare, high-volume errors.
percentage: 10.0
exporters:
logging:
loglevel: debug
# For demonstration, we'll also log spans to console.
# In a real scenario, this would be your Jaeger or OTLP exporter.
service:
pipelines:
traces:
receivers: [otlp]
processors: [sample] # Apply head sampling *before* exporting
exporters: [logging]
When a request comes in, the OpenTelemetry SDK intercepts the first span. Using the trace_id_ratio strategy, it generates a random number and compares it to the percentage configured. If the number falls within the allowed range (e.g., less than 10.0 for 10%), the entire trace associated with that trace_id is marked for sampling. Subsequent spans belonging to that same trace will also be kept. If the random number is outside the range, the entire trace is discarded immediately.
This means that for a 10% sampling rate, only 10% of your complete traces will ever reach your backend (like Jaeger). This is incredibly efficient for reducing raw data volume.
The problem this solves is the overwhelming cost and complexity of storing and analyzing every single trace in a high-throughput system. Imagine a popular e-commerce site generating millions of traces a day. Storing all of that is infeasible. Head sampling acts as a gatekeeper, ensuring you get a representative subset of your traffic without drowning in data.
Internally, the trace_id_ratio sampler works by taking the trace_id (a 128-bit hexadecimal string), converting its first 64 bits (or some portion thereof) into an integer, and then checking if that integer modulo 100 (for percentage) is less than the configured percentage. This is deterministic for a given trace ID – if a trace ID is sampled, it will always be sampled.
The exact levers you control are primarily the percentage in trace_id_ratio. You can also explore other head-based strategies like probabilistic (which is essentially the same as trace_id_ratio but might have different internal implementations) or rate_limiting, which samples at a fixed rate per second, regardless of trace IDs.
What most people don’t grasp is that head sampling, while great for volume, is a blunt instrument for capturing rare errors. If your error rate is extremely low (e.g., 0.1%), a 10% head sampler might still miss 90% of those rare errors because the decision is made before any error information is even known. You’re essentially betting that a random sample will contain the errors you care about, which isn’t always a safe bet.
The next step is understanding how tail sampling can address the limitations of head sampling, especially for error detection.