Datadog APM can trace requests across services even if those services don’t share a single trace ID.

Let’s see what that looks like. Imagine a web request comes into your frontend service, frontend-web. This service then calls an API, user-api, to fetch user data. The user-api might then call another service, auth-service, to verify a token.

Here’s a simplified trace of that request:

10:30:01.123 frontend-web: Received GET /users/123
10:30:01.125 frontend-web: Calling user-api GET /users/123
10:30:01.150 user-api: Received GET /users/123
10:30:01.155 user-api: Calling auth-service POST /verify-token
10:30:01.200 auth-service: Received POST /verify-token
10:30:01.210 auth-service: Token verified
10:30:01.220 user-api: Auth token verified
10:30:01.250 user-api: User data fetched
10:30:01.260 frontend-web: Received response from user-api (200 OK)
10:30:01.270 frontend-web: Responded to client (200 OK)

Datadog APM, when properly instrumented, doesn’t just log these events. It stitches them together into a single trace. The key is how it propagates context. When frontend-web calls user-api, it injects a dd-trace-id header into the request. user-api receives this header, and when it calls auth-service, it passes that same dd-trace-id along. This allows Datadog to link all these disparate operations into one coherent view of the request’s journey.

The problem Datadog APM solves is the "distributed tracing" challenge. In a microservices architecture, a single user-facing request can touch dozens, even hundreds, of individual services. Without a way to correlate these calls, debugging becomes a nightmare. You’d be sifting through logs from each service independently, trying to manually piece together the sequence of events, identify which service introduced latency, or which one failed.

Internally, Datadog’s tracing library (the "tracer") works by creating a root span for the initial request. As the request flows through different services, new spans are created, each representing a unit of work (like an HTTP call, a database query, or a function execution). Crucially, parent-child relationships are established between these spans. The dd-trace-id header is the mechanism for propagating the trace context, ensuring that all spans belonging to the same logical request share a common trace identifier.

You control this by configuring your application’s instrumentation. For example, in Python with the ddtrace library, you might have code like this:

from ddtrace import tracer
from flask import Flask, request
import requests

app = Flask(__name__)

@app.route('/users/<user_id>')
@tracer.wrap() # This automatically instruments the Flask route
def get_user(user_id):
    # Datadog automatically injects trace context into outgoing requests
    response = requests.get(f"http://user-api/users/{user_id}")
    return response.json()

if __name__ == '__main__':
    app.run(port=5000)

And in the user-api service:

from ddtrace import tracer
from flask import Flask, request
import requests

app = Flask(__name__)

@app.route('/users/<user_id>')
@tracer.wrap() # Instruments the route
def get_user_data(user_id):
    # Datadog automatically injects trace context into outgoing requests
    response = requests.post("http://auth-service/verify-token", json={"token": request.headers.get("Authorization")})
    # ... fetch user data ...
    return {"user_id": user_id, "data": "some_user_data"}

if __name__ == '__main__':
    app.run(port=5001)

The @tracer.wrap() decorator is the magic here. It tells the ddtrace library to start a new span when the decorated function is entered and end it when it exits. For outgoing HTTP requests made using libraries like requests, the ddtrace library automatically patches them to inject the trace context headers. This is why you don’t explicitly see headers.update({"dd-trace-id": tracer.current_trace_id()}) in the example; the instrumentation handles it.

The most surprising part of how Datadog APM achieves this is its ability to reconstruct traces even when there are network hiccups or if a service crashes before it can forward the trace context. The tracer library is designed to buffer and send trace data asynchronously. If a service temporarily loses connectivity to the Datadog Agent, it stores the spans locally. When connectivity is restored, it attempts to send the buffered data. This resilience means that even in a chaotic distributed system, you’re likely to get a significant portion, if not the entirety, of your trace data.

Once instrumented, you’ll see these traces visualized in the Datadog UI, allowing you to explore request flows, identify bottlenecks, and pinpoint errors across your entire application stack. The next step is to start correlating these traces with logs and metrics for a truly unified observability picture.

Want structured learning?

Take the full Performance course →