OpenTelemetry profiling is not just about finding performance bottlenecks; it’s a continuous, low-overhead observation of your running application’s execution, giving you a real-time, dynamic view of what your code is actually doing.

Imagine you have a web service written in Go. You’ve deployed it and it’s handling requests, but you suspect it’s not as efficient as it could be. Instead of just looking at CPU or memory metrics, you want to see which functions are consuming the most CPU time, and how they’re being called.

Here’s how you might set that up using OpenTelemetry. First, you need to enable profiling in your application. For Go, this often involves importing the runtime/pprof package and starting a profile.

package main

import (
	"fmt"
	"log"
	"net/http"
	_ "net/http/pprof" // Import for side effects: registers HTTP handlers
	"runtime"
	"time"
)

func main() {
	// Start a background goroutine to simulate work
	go func() {
		for {
			doWork()
			time.Sleep(100 * time.Millisecond)
		}
	}()

	// Start an HTTP server that also exposes pprof endpoints
	// http://localhost:8080/debug/pprof/
	log.Println("Starting server on :8080")
	log.Fatal(http.ListenAndServe(":8080", nil))
}

func doWork() {
	// Simulate some CPU-intensive work
	for i := 0; i < 1000000; i++ {
		_ = i * i
	}
	// Simulate some memory allocation
	_ = make([]byte, 1024)
}

When this application runs, the net/http/pprof package automatically registers handlers under /debug/pprof/ on the default HTTP server. You can then fetch profiles using tools like go tool pprof. For example, to get a CPU profile:

go tool pprof http://localhost:8080/debug/pprof/profile?seconds=30

This command fetches a 30-second CPU profile from your running application and drops you into an interactive pprof session. You can then use commands like top to see the functions consuming the most CPU, list <function_name> to see the source code annotated with profiling data, and web to generate a visual call graph.

The core problem OpenTelemetry profiling solves is the gap between static metrics (CPU utilization, memory usage) and the dynamic reality of execution. Metrics tell you that something is slow or using resources; profiling tells you why and where in your code. It moves from "my service is slow" to "function calculate_fee in payment_processor.go is taking 70% of CPU because of repeated complex calculations in its inner loop."

Internally, OpenTelemetry profiling leverages various sampling mechanisms. For CPU profiling, it’s typically a timer-based interrupt that periodically inspects the call stack of active goroutines. For memory profiling, it can be either allocation-based (sampling every N bytes allocated) or heap-based (sampling the heap at intervals). These samples are then aggregated and exported as a "profile" signal, distinct from traces, metrics, and logs.

The key levers you control are:

  • Sampling Rate/Frequency: How often profiles are collected. Higher frequency means more detail but also more overhead. For CPU profiles, this is often controlled by the duration of the collection (seconds=30 in the pprof example). For memory, it’s the "sampling rate" – e.g., sampling every 512KB of allocations.
  • Profile Type: What you are profiling – CPU, memory (heap/allocations), goroutines, mutex contention, etc.
  • Collection Duration/Interval: For continuous profiling, how long a profile is collected before being exported, or the interval between heap snapshots.
  • Exporter Configuration: Where the collected profile data is sent – an OpenTelemetry Collector, directly to a backend, etc.

The "continuous" aspect is crucial. Instead of taking snapshots only when you suspect a problem, continuous profiling runs constantly, allowing you to observe performance under normal load and detect emergent issues before they become critical. It’s like having a doctor monitor your vital signs 24/7, rather than just when you feel sick.

A common misunderstanding is that profiling always incurs significant overhead. While aggressive sampling can, OpenTelemetry’s profiling collectors are designed for low overhead. The sampling approach means you’re not instrumenting every single function call, but rather observing the state of the application at discrete points in time. This makes it feasible to run profiling continuously in production environments.

The next step after effectively using profiling to optimize your application’s performance is to correlate these detailed execution insights with other signals like distributed traces.

Want structured learning?

Take the full Opentelemetry course →