Adding OpenTelemetry instrumentation to an application isn’t just about seeing more logs; it’s about fundamentally changing how you understand your system’s emergent behavior.
Let’s see this in action. Imagine a simple Python Flask app that makes an external API call.
from flask import Flask
import requests
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
# Configure Tracer
tracer_provider = TracerProvider()
span_processor = BatchSpanProcessor(OTLPSpanExporter())
tracer_provider.add_span_processor(span_processor)
trace.set_tracer_provider(tracer_provider)
# Configure Meter
meter_provider = MeterProvider(
reader=PeriodicExportingMetricReader(OTLPMetricExporter())
)
metrics.set_meter_provider(meter_provider)
# Instrument Flask and Requests
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
RequestsInstrumentor().instrument()
@app.route("/")
def hello():
requests.get("https://www.google.com") # Simulate an external call
return "Hello, World!"
if __name__ == "__main__":
app.run(port=5000)
When this app runs and you hit http://localhost:5000/, you’ll see spans for the incoming HTTP request to Flask, and another span for the outgoing requests.get call to Google. If you have an OTLP-compatible collector (like the OpenTelemetry Collector) running on localhost:4317, these traces and metrics will be sent there.
The core problem OpenTelemetry solves is the "distributed monolith" – applications that are technically distributed but lack the visibility to understand how requests flow and where latency or errors originate across service boundaries. Before OpenTelemetry, you’d be stitching together logs from different services, trying to correlate timestamps, and often guessing about the root cause of issues. OpenTelemetry provides a standardized way to emit telemetry data (traces, metrics, logs) from your application, which can then be collected, processed, and analyzed by backend systems.
At its heart, OpenTelemetry works by having "instrumentation libraries." These libraries are specific to languages (Python, Java, Go, etc.) and frameworks/libraries (Flask, Django, Spring, requests, http.client, etc.). When you import and initialize these instrumentation libraries, they automatically wrap the relevant functions or methods in your application. For example, FlaskInstrumentor hooks into Flask’s request routing mechanism, and RequestsInstrumentor hooks into the requests library’s connection and send methods.
Each time a wrapped function is called, the instrumentation library generates a "span." A span represents a single operation, like an incoming HTTP request, a database query, or an outgoing HTTP call. Spans have a start time, an end time, a name, attributes (key-value pairs describing the operation, like HTTP method, URL, status code), and potentially events (like errors or logs occurring within the span). Crucially, spans can be parented, forming a "trace" that represents the entire journey of a request through your system.
Metrics are also generated. For the Flask example, you’d get metrics like the number of requests per route, the duration of those requests, and the number of external calls made. These are aggregated over time by the MeterProvider and exported.
The actual "levers" you control are primarily in the configuration of the SDK. You choose which exporters to use (e.g., OTLP, Jaeger, Zipkin), how to process spans (e.g., batching, filtering), and how to configure the TracerProvider and MeterProvider. You also decide what to instrument. While automatic instrumentation is convenient, you can also add "manual instrumentation" for custom business logic:
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
@app.route("/process")
def process_data():
with tracer.start_as_current_span("process_data_operation"):
# Your custom logic here
result = perform_complex_calculation()
return str(result)
This allows you to create spans for internal operations that aren’t covered by automatic instrumentation, giving you even finer-grained visibility.
A common point of confusion is understanding that OpenTelemetry itself is just the data generation and export mechanism. It doesn’t store or visualize your telemetry. You need a backend system (like Jaeger, Prometheus+Grafana, Datadog, Honeycomb, etc.) to receive, store, and query the data that OpenTelemetry sends. Without a backend, your instrumentation is effectively sending data into a black hole.
The next step after getting basic instrumentation working is often to explore sampling strategies to manage the volume of traces generated, especially in high-traffic applications.