OpenTelemetry’s true power isn’t just in automatic instrumentation; it’s in the precision you gain by manually crafting spans, turning your code into a narrative of execution.

Let’s say you’ve got a critical business process that involves calling three different microservices, fetching data from a cache, and then performing some complex in-memory computation. Automatic instrumentation might show you the individual network calls, but it won’t tell you how long the entire business process took, nor will it highlight the duration of that specific in-memory computation, which might be your real bottleneck.

Here’s how you’d instrument that with OpenTelemetry in Python:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter

# Configure the tracer provider
tracer_provider = TracerProvider()
tracer = tracer_provider.get_tracer(__name__)
tracer_provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter())) # For demonstration, output to console

# Simulate a business process
def process_order(order_id):
    # Start a new span for the entire order processing
    with tracer.span(name="process_order") as span:
        span.set_attribute("order.id", order_id) # Add context

        # First microservice call
        with tracer.span(name="call_inventory_service") as sub_span:
            sub_span.set_attribute("service.name", "inventory-service")
            # Simulate network call
            import time
            time.sleep(0.1)

        # Cache lookup
        with tracer.span(name="lookup_user_cache") as sub_span:
            sub_span.set_attribute("cache.type", "redis")
            # Simulate cache lookup
            time.sleep(0.05)

        # Second microservice call
        with tracer.span(name="call_payment_service") as sub_span:
            sub_span.set_attribute("service.name", "payment-service")
            # Simulate network call
            time.sleep(0.15)

        # In-memory computation
        with tracer.span(name="calculate_discount") as sub_span:
            sub_span.set_attribute("computation.type", "discount_logic")
            # Simulate heavy computation
            for i in range(100000):
                _ = i * i
            sub_span.set_attribute("discount.percentage", 10)

        # Final microservice call
        with tracer.span(name="send_confirmation_email") as sub_span:
            sub_span.set_attribute("service.name", "email-service")
            # Simulate network call
            time.sleep(0.08)

    return f"Order {order_id} processed successfully."

# Example usage
if __name__ == "__main__":
    result = process_order("ORD12345")
    print(result)

When you run this, you’ll see output like this (simplified for clarity):

{"name": "call_inventory_service", "kind": 1, "start_time": "...", "end_time": "...", "attributes": {"service.name": "inventory-service"}}
{"name": "lookup_user_cache", "kind": 1, "start_time": "...", "end_time": "...", "attributes": {"cache.type": "redis"}}
{"name": "call_payment_service", "kind": 1, "start_time": "...", "end_time": "...", "attributes": {"service.name": "payment-service"}}
{"name": "calculate_discount", "kind": 1, "start_time": "...", "end_time": "...", "attributes": {"computation.type": "discount_logic", "discount.percentage": 10}}
{"name": "send_confirmation_email", "kind": 1, "start_time": "...", "end_time": "...", "attributes": {"service.name": "email-service"}}
{"name": "process_order", "kind": 1, "start_time": "...", "end_time": "...", "attributes": {"order.id": "ORD12345"}}

This structure allows you to see not only the latency of each individual network hop but also the duration of your custom logic (calculate_discount) and the overall process_order duration. You can then dive into any of these spans to see their attributes, helping you pinpoint exactly where time is being spent.

The tracer.span(name="...") context manager is the core of manual instrumentation. It automatically starts a span when entering the with block and ends it when exiting, recording the elapsed time and any attributes you set. Attributes are key-value pairs that provide context to your spans, such as order.id, service.name, or cache.type. These attributes are crucial for filtering and analyzing your traces later.

You can also create spans that are children of existing ones. In the example above, call_inventory_service, lookup_user_cache, etc., are all children of process_order. This creates a hierarchical view of your application’s execution, making it much easier to understand complex workflows.

The TracerProvider is responsible for managing the tracers and span processors. A Tracer is what you use to create spans. BatchSpanProcessor and ConsoleSpanExporter are examples of processors and exporters. The processor decides when to send spans (e.g., in batches), and the exporter decides where to send them (e.g., to the console, Jaeger, Prometheus, etc.). For production, you’d typically use an exporter like OTLPTraceExporter to send data to your OpenTelemetry Collector.

When you set attributes on a span, you’re essentially adding metadata. This metadata is what allows you to ask powerful questions of your trace data. For instance, you could filter all process_order spans where order.id is ORD12345 and then find the one with the longest calculate_discount duration. You can also add events to spans to mark specific occurrences within a span’s lifetime, like a critical error or a cache miss.

The most subtle, yet powerful, aspect of manual span creation is the ability to define the boundaries of your traced operations. Automatic instrumentation is often too coarse-grained; it might trace every single HTTP request to a downstream service, creating a lot of noise. By manually creating a span around the entire interaction with that service (e.g., call_inventory_service), you aggregate that noise into a single, meaningful unit of work. You can still add attributes to that span to record details like the specific endpoint called or the request payload size, but you control the top-level representation of that operation.

The next step in mastering OpenTelemetry is understanding how to propagate trace context across process boundaries, ensuring your custom spans are linked correctly in distributed systems.

Want structured learning?

Take the full Opentelemetry course →