Trace IDs are the glue that holds distributed systems together, letting you follow a single request as it hops between microservices.

Here’s a conceptual demonstration of how OpenTelemetry traces work by following a hypothetical e-commerce order. Imagine a user placing an order.

  1. Frontend Service (Order-UI):

    • A request comes in to create an order.
    • The Order-UI service generates a unique trace_id (e.g., a1b2c3d4e5f67890abcdef1234567890).
    • It also generates a span_id for its own operation (e.g., 0001).
    • It starts a "create_order" span, setting the trace_id and its span_id.
    • Before calling the next service, it generates a new span_id (e.g., 0002) for the child operation and propagates the original trace_id and the new span_id (as parent_span_id) to the Order-Service. This propagation is typically done via HTTP headers (e.g., traceparent: 00-a1b2c3d4e5f67890abcdef1234567890-0000000000000002-01).
  2. Order Service (Order-Service):

    • Receives the request from Order-UI, including the traceparent header.
    • Parses the header: extracts trace_id (a1b2c3d4e5f67890abcdef1234567890) and parent_span_id (0000000000000002).
    • Generates its own span_id (e.g., 0003) for its operation within this trace.
    • Starts a "process_order" span, setting the extracted trace_id, its own span_id (0003), and the extracted parent_span_id (0000000000000002).
    • It might then call the Payment-Service. It generates a new span_id (e.g., 0004) for the child payment operation and propagates the same trace_id and the new span_id as parent_span_id.
  3. Payment Service (Payment-Service):

    • Receives the request, parses the traceparent header.
    • Extracts trace_id (a1b2c3d4e5f67890abcdef1234567890) and parent_span_id (0000000000000004).
    • Generates its span_id (e.g., 0005).
    • Starts a "charge_customer" span, using the extracted trace_id, its own span_id (0005), and the extracted parent_span_id (0000000000000004).
    • Completes its operation and sends the response back.
  4. Back to Order Service:

    • Receives the response from Payment-Service.
    • Completes its "process_order" span.
  5. Back to Frontend Service:

    • Receives the response from Order-Service.
    • Completes its "create_order" span.

All these spans, each with the same trace_id, are sent to an OpenTelemetry collector or directly to a tracing backend (like Jaeger, Zipkin, or Datadog). The tracing backend then reconstructs the entire request flow, visually showing you the Order-UI calling Order-Service, which then called Payment-Service, all linked by that single trace_id.

The core problem this solves is observability in a distributed system. Without trace IDs, if a request fails in one of many microservices, you’d have no easy way to know which service caused the failure or what sequence of events led to it. You’d be staring at logs across dozens of machines, trying to manually correlate timestamps and request identifiers. Trace IDs provide a structured, automated way to see the entire journey of a request.

Internally, OpenTelemetry defines a standard format for trace context propagation. The most common standard is W3C Trace Context, which uses HTTP headers like traceparent and tracestate.

  • traceparent: Contains the version, trace_id, parent_id (of the current span), and trace flags (like whether to sample).
  • tracestate: An optional field for vendor-specific information.

The trace_id is a 128-bit identifier that remains constant for the entire trace, no matter how many services are involved. Each operation within a service, or each call to another service, creates a new span. A span has its own unique span_id and a parent_span_id that links it back to the span that initiated it. This parent-child relationship forms the tree structure of the trace.

The key to successful tracing is ensuring that the trace context (the trace_id and the current parent_span_id) is correctly propagated across service boundaries. This usually means configuring your HTTP clients, message queue producers, gRPC clients, etc., to inject the trace context into outgoing requests and configuring your HTTP servers, message queue consumers, gRPC servers, etc., to extract it from incoming requests. Libraries and SDKs from OpenTelemetry often handle this automatically if configured correctly, but it’s crucial to understand the mechanism.

A common pitfall is assuming that all instrumentation automatically propagates context. For instance, if you have a custom RPC framework or a specific messaging pattern not covered by standard integrations, you might need to manually inject and extract the traceparent header. Many tracing backends also allow you to define custom rules for span naming or attribute extraction, which can be powerful for refining your observability.

The next concept you’ll likely encounter is understanding span attributes and events, which provide rich, contextual data about what happened within a specific span.

Want structured learning?

Take the full Opentelemetry course →