OpenTelemetry Collector pipelines are how you get telemetry data from your applications and infrastructure into your observability backend, but their true power lies in their ability to transform that data before it ever leaves your network.

Let’s see it in action. Imagine we’re collecting Prometheus metrics from a Kubernetes cluster and sending them to a SaaS vendor.

# config.yaml
receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'my-app'
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_label_app]
              action: keep
              regex: my-app-.*

processors:
  batch:
    send_batch_size: 1000
    timeout: 10s

exporters:
  otlp:
    endpoint: "my-observability-backend.example.com:4317"
    tls:
      insecure: true # Use this for testing, not production!

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [batch]
      exporters: [otlp]

This configuration tells the Collector:

  1. Receive metrics using the prometheus receiver. It’s configured to scrape pods labeled app: my-app-.*.
  2. Process these metrics in batches of up to 1000 or every 10 seconds using the batch processor. This is crucial for efficiency.
  3. Export the batched metrics via OTLP to my-observability-backend.example.com:4317.

The Collector itself is a single binary that can be deployed as a DaemonSet in Kubernetes, a sidecar, or a standalone service. It’s designed to be flexible, allowing you to route data from multiple sources to multiple destinations, with as many processing steps in between as you need.

The core concept is the service.pipelines section. Here, you define named pipelines (e.g., metrics, traces, logs). Each pipeline is a directed acyclic graph (DAG) of components: one or more receivers feed data into one or more processors, which then pass the transformed data to one or more exporters. You can have multiple independent pipelines, or pipelines that share components. For example, a single batch processor could be used by both a metrics and a traces pipeline.

The receivers are the entry points. They understand specific protocols or data formats. Common ones include prometheus (for Prometheus metrics), otlp (OpenTelemetry Protocol, for data directly from instrumented apps), fluentforward (for Fluentd logs), jaeger (for Jaeger traces), and kafka. Each receiver has its own configuration specific to how it ingests data.

processors are where the magic happens. They can filter, modify, attribute, sample, or aggregate telemetry data. The batch processor is almost always used to group telemetry signals into larger, more efficient payloads. Other useful processors include:

  • memory_limiter: Prevents the collector from consuming too much memory.
  • attributes: Adds or modifies attributes on telemetry signals. For example, adding a environment: production attribute to all data.
  • resource: Adds or modifies resource attributes. This is great for adding hostnames, cluster names, or cloud provider metadata.
  • filter: Drops telemetry based on defined rules.
  • spanmetrics: Generates metrics from trace data, useful for high-cardinality trace analysis.

exporters are the destinations. They know how to send data to various backends. Common exporters include otlp (to send data in OTLP format), prometheus (to expose metrics for Prometheus to scrape), logging (to write data to local logs for debugging), file (to write to files), and specific vendor exporters like datadog, newrelic, splunk_hec, etc.

Think of it like a factory assembly line. Receivers are the raw material input stations. Processors are the machines that shape, polish, or inspect the product. Exporters are the loading docks where the finished goods are shipped out. You can have multiple lines (pipelines) running in parallel, each with its own set of machines and destinations.

When you configure a receiver like prometheus, you’re telling it where to look for metrics. When you configure an exporter like otlp, you’re telling it where to send the processed data. The service.pipelines section is the conveyor belt connecting them, defining the order and flow.

A common, often overlooked, pattern is using the attributes processor to inject environment or service metadata that isn’t already present in the incoming telemetry. This is critical for consistent filtering and analysis in your backend. For example, if your Prometheus targets don’t have a service.name label, you can add it universally in the Collector:

processors:
  attributes/add_service_name:
    actions:
      - key: service.name
        action: insert
        value: "my-frontend-service"

Then, in your service.pipelines for metrics:

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [batch, attributes/add_service_name] # Add it here
      exporters: [otlp]

This ensures all metrics flowing through this pipeline will carry the service.name: my-frontend-service attribute, making them easily identifiable in your observability platform.

The batch processor, while seemingly simple, is a critical optimization. Without it, every single metric point or span would be sent as an individual network request. The batch processor groups these into larger chunks, significantly reducing network overhead and improving the throughput of both the Collector and the backend. The send_batch_size and timeout parameters allow you to tune this for your specific network conditions and data volume.

After you’ve successfully configured your pipelines and are seeing data arrive in your backend, the next immediate challenge is often dealing with high cardinality or unwanted data.

Want structured learning?

Take the full Opentelemetry course →