The OpenTelemetry SDK and API are not interchangeable; the SDK is a concrete implementation of the abstract API, and understanding this distinction is key to instrumenting your applications effectively.
Imagine you’re building a distributed system. You want to see how requests flow through your services, identify bottlenecks, and debug issues. OpenTelemetry provides a standardized way to do this, but it’s not a single, monolithic library. It’s a two-part system: the API and the SDK.
The OpenTelemetry API is the contract. It defines the what – what methods you can call to create spans, record attributes, and emit events. It’s the set of interfaces and abstract classes that your code interacts with. Think of it as the blueprint for observability.
// Example of using the OpenTelemetry API (Java)
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.Tracer;
public class MyService {
private static final OpenTelemetry openTelemetry = OpenTelemetry.get(); // Get the global instance
private static final Tracer tracer = openTelemetry.getTracer("my-service-tracer"); // Get a tracer for this service
public void doWork() {
Span span = tracer.spanBuilder("my-operation").startSpan(); // Start a new span
try {
// Your application logic here...
System.out.println("Doing work...");
Thread.sleep(100); // Simulate work
} catch (InterruptedException e) {
span.recordException(e); // Record any exception
Thread.currentThread().interrupt();
} finally {
span.end(); // End the span
}
}
public static void main(String[] args) {
new MyService().doWork();
}
}
This code snippet shows how you use the API. You ask for a Tracer and then use it to create Spans. Notice there’s no mention of where these spans are going or how they are formatted. That’s the API’s job – to provide a consistent way to generate observability data, regardless of the underlying implementation.
The OpenTelemetry SDK is the how. It’s the concrete implementation that takes the data generated via the API and processes it. This includes:
- Span Processors: These components receive completed spans from the
TracerProvider(which is part of the SDK). They can sample spans (decide which ones to send), batch them for efficiency, and enrich them. - Exporters: Once a span is processed, an exporter takes it and sends it to a backend system like Jaeger, Prometheus, or a cloud-native observability platform.
- Sampler: This determines which traces are recorded. A common sampler is
TraceIdRatioSamplerwhich samples a fixed percentage of traces. - Resource: This attaches identifying information about the entity producing telemetry, such as service name, version, and cloud provider details.
Here’s how you might configure an SDK to export to Jaeger:
// Example of configuring an OpenTelemetry SDK with an OTLP exporter (Java)
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.sdk.OpenTelemetrySdk;
import io.opentelemetry.sdk.resources.Resource;
import io.opentelemetry.sdk.trace.SdkTracerProvider;
import io.opentelemetry.sdk.trace.export.BatchSpanProcessor;
import io.opentelemetry.exporter.logging.LoggingSpanExporter; // Or use OTLP exporter
public class OpenTelemetryConfig {
public static OpenTelemetry initializeOpenTelemetry() {
// 1. Define Resource attributes
Resource serviceResource = Resource.getDefault()
.toBuilder()
.put("service.name", "my-awesome-service")
.put("service.version", "1.0.0")
.build();
// 2. Configure TracerProvider
SdkTracerProvider tracerProvider = SdkTracerProvider.builder()
.setResource(serviceResource)
.addSpanProcessor(BatchSpanProcessor.builder(new LoggingSpanExporter()).build()) // Simple logging exporter for demo
// For OTLP exporter:
// .addSpanProcessor(BatchSpanProcessor.builder(OtlpGrpcSpanExporter.builder().build()).build())
.build();
// 3. Initialize OpenTelemetry SDK
OpenTelemetry openTelemetry = OpenTelemetrySdk.builder()
.setTracerProvider(tracerProvider)
.build();
// Register the SDK instance globally for easy access via OpenTelemetry.get()
// This is optional but common.
// OpenTelemetry.setGlobalOpenTelemetry(openTelemetry);
return openTelemetry;
}
public static void main(String[] args) {
OpenTelemetry otel = initializeOpenTelemetry();
// Now use the API (as shown in the previous example) with the configured SDK
// ...
}
}
In this configuration, SdkTracerProvider is the SDK component that manages SpanProcessors. We’ve added a BatchSpanProcessor that uses a LoggingSpanExporter to print spans to the console. In a real-world scenario, you’d replace LoggingSpanExporter with an exporter for your chosen backend, like OtlpGrpcSpanExporter which sends data via the OpenTelemetry Protocol (OTLP) over gRPC.
The critical point is that your application code should only depend on the API. The SDK is configured and initialized separately, often at application startup. This separation of concerns allows you to swap out the SDK implementation or its configuration without changing your application’s instrumentation code. For instance, you could switch from exporting to Jaeger to exporting to Datadog by simply changing the SDK configuration, not your application’s doWork() method.
When you instrument your code using the OpenTelemetry API, you are essentially writing against a set of interfaces. The SDK provides the concrete implementations for these interfaces. The SDK is responsible for the entire lifecycle of telemetry data after it’s created via the API: collecting it, processing it (sampling, batching), and exporting it to a backend. The API is the stable, language-specific library that application developers interact with to generate telemetry.
The most surprising truth is that your application code should not directly depend on the OpenTelemetry SDK’s specific classes like SdkTracerProvider or BatchSpanProcessor. It should only depend on the io.opentelemetry.api package. If your application code has imports starting with io.opentelemetry.sdk, you are tightly coupling your application to a specific implementation, which defeats much of the purpose of OpenTelemetry’s design.
The next concept you’ll run into is context propagation, which is how trace IDs and span IDs are passed between services.