OpenTelemetry Lambda is surprisingly good at capturing cold start traces, but not in the way you might expect.

Let’s see it in action. Imagine a Node.js Lambda function that’s been idle for a while. When the first request hits, Lambda provisions a new execution environment, loads your code, and runs your initialization logic. For a cold start, OpenTelemetry Lambda will generate a trace that looks something like this:

{
  "traceId": "a1b2c3d4e5f67890a1b2c3d4e5f67890",
  "parentId": "00000000000000000000000000000000",
  "id": "1234567890abcdef1234567890abcdef",
  "name": "Initialization",
  "kind": 1, // SPAN_KIND_INTERNAL
  "timestamp": 1678886400000000000, // Start of initialization
  "duration": 5000000000, // 5 seconds
  "attributes": {
    "faas.coldstart": true,
    "faas.execution": "some-execution-id",
    "faas.name": "my-lambda-function",
    "faas.runtime": "nodejs18.x"
  }
}

This Initialization span with faas.coldstart: true is the key. It’s not a span generated by your application code, but by the OpenTelemetry Lambda layer itself, wrapping the entire initialization phase. Subsequent invocations, which hit a warm execution environment, won’t have this specific span.

The problem OpenTelemetry Lambda solves is providing visibility into the overhead of serverless execution, particularly the unpredictable latency introduced by cold starts. Without it, you’d just see the total Lambda duration, making it hard to distinguish between slow application logic and the time spent by the platform setting up the environment.

Internally, the OpenTelemetry Lambda layer hooks into the Lambda execution lifecycle. It uses the Lambda Runtime Interface Client (RIC) to intercept events. Before your handler code is invoked for the first time, it starts a root span. This span records the time from when the execution environment is ready until your handler begins its first execution. It then explicitly sets the faas.coldstart attribute to true on this span if it’s indeed a cold start. Once your handler finishes, or if it’s a warm start, it records the span. For warm starts, the faas.coldstart attribute will be absent or false.

The exact levers you control are primarily through environment variables for the Lambda function itself and your OpenTelemetry configuration.

  • OPENTELEMETRY_COLLECTOR_ENDPOINT: This is crucial. It tells the OpenTelemetry Lambda layer where to send your traces. For example, http://localhost:4318/v1/traces if you’re running a local collector, or https://your-otel-collector.example.com:4317 for a remote one. Without this, traces won’t go anywhere.
  • OPENTELEMETRY_LAMBDA_LOG_FORMAT: Setting this to JSON ensures that the Lambda layer outputs its internal logs (including span information) in a structured format that can be parsed.
  • AWS_LAMBDA_EXEC_WRAPPER: This environment variable is set by the OpenTelemetry Lambda layer itself, pointing to the wrapper script that intercepts the Lambda runtime. You typically don’t need to set this manually if you’re using the layer.
  • NODE_OPTIONS: For Node.js, you might see this set to --require /opt/otel-handler.js. This tells Node.js to load the OpenTelemetry handler script automatically, enabling its instrumentation.

The faas.coldstart attribute is a standardized OpenTelemetry semantic convention. When present and true, it signifies that the span covers the entire lifecycle of initializing a new Lambda execution environment. This includes everything from the underlying infrastructure provisioning to the loading of your function code and any dependencies, up to the point where your actual function handler begins execution. The duration of this span directly quantifies the cold start overhead.

What most people miss is that the Initialization span is the cold start. It’s not a separate event; it’s the root span of the trace for that particular cold invocation, and it’s automatically generated by the Lambda layer. You don’t instrument it yourself. Your application code instrumentation then nests within this Initialization span if it executes during that phase, or it becomes the root span of a new trace if the environment is warm.

The next concept to grasp is how to correlate these cold start traces with actual application-level performance metrics and how to use this data to optimize your function’s initialization logic.

Want structured learning?

Take the full Opentelemetry course →