The OpenTelemetry trace exporter failed because the collector couldn’t establish a persistent, authenticated connection to the backend it was trying to send traces to.

Common Causes and Fixes

  1. Network Connectivity Issues: The collector cannot reach the OTLP endpoint.

    • Diagnosis: From the collector host, try curl -v <your_otlp_endpoint_address>:<port>. Look for "Connection refused" or "Timeout."
    • Fix: Ensure firewalls (host-based and network) allow outbound traffic on the OTLP port (e.g., 4317 for gRPC, 4318 for HTTP). If using a proxy, verify HTTP_PROXY and HTTPS_PROXY environment variables are correctly set for the collector process.
    • Why it works: This bypasses the collector’s internal networking and directly tests the fundamental ability to establish a TCP connection to the target address and port.
  2. Incorrect OTLP Endpoint: The configured endpoint address or port is wrong.

    • Diagnosis: Review your OpenTelemetry Collector configuration (config.yaml or equivalent). Specifically, check the endpoint parameter within your exporters section for the OTLP exporter.
    • Fix: Correct the endpoint to the actual OTLP receiver address and port. For example, change endpoint: "http://localhost:4318" to endpoint: "http://otel-collector.mycompany.com:4318" or endpoint: "http://192.168.1.100:4317".
    • Why it works: The exporter needs the precise network location of the OTLP receiver to send data. A typo or outdated address prevents any data from reaching its destination.
  3. Authentication/Authorization Failure (TLS/SSL Issues): The collector cannot authenticate with the OTLP endpoint due to certificate problems.

    • Diagnosis: Check the collector logs for messages like "x509: certificate signed by unknown authority," "remote error: tls: bad certificate," or "ssl handshake failed."
    • Fix (if using self-signed certs or private CA): Configure the collector to trust your CA. For the otlp exporter, set tls to:
      tls:
        cert_file: /path/to/client.crt
        key_file: /path/to/client.key
        ca_file: /path/to/ca.crt # Ensure this points to your CA certificate
      
      If the backend is using a self-signed cert and the collector doesn’t trust it, you might need to add the backend’s CA to the collector’s system trust store or explicitly provide it via ca_file.
    • Why it works: This ensures the collector trusts the identity of the OTLP endpoint presented by its TLS certificate, allowing the secure handshake to complete.
  4. Authentication/Authorization Failure (API Keys/Tokens): The collector is sending requests without valid credentials or with incorrect ones.

    • Diagnosis: Examine collector logs for authentication errors from the OTLP backend. These often manifest as 401 Unauthorized or 403 Forbidden HTTP status codes in the collector’s export attempts.
    • Fix: Ensure the headers field in your OTLP exporter configuration contains the correct authentication token or API key. For example:
      exporters:
        otlp:
          endpoint: "https://your-backend.com:4317"
          tls:
            insecure_skip_verify: true # Use only for testing, not production
          headers:
            Authorization: "Bearer your_secret_token_here"
            X-API-Key: "your_api_key_here"
      
      Replace your_secret_token_here or your_api_key_here with your actual credentials.
    • Why it works: Many OTLP backends require specific headers for authentication. Providing these correctly allows the backend to identify and authorize the incoming trace data.
  5. Backend Service Unavailability or Overload: The OTLP endpoint is running but is not accepting new connections or processing requests.

    • Diagnosis: Check the health status of your OTLP backend service (e.g., Jaeger, Tempo, Honeycomb, Datadog agent). Look for high CPU, memory, or disk I/O, or error logs within the backend itself.
    • Fix: Scale up your OTLP backend resources or investigate and resolve the performance bottlenecks within the backend application. Restarting the backend service might temporarily resolve issues caused by stuck processes.
    • Why it works: If the receiving service is overwhelmed or crashing, it cannot accept or process the incoming trace data, leading to connection errors or timeouts reported by the exporter.
  6. Collector Configuration Error (Exporter Type Mismatch): The exporter is configured incorrectly for the protocol or format expected by the backend.

    • Diagnosis: Verify the protocol setting in your otlp exporter configuration against what your backend expects. Common values are grpc and http/protobuf.
    • Fix: Adjust the protocol setting. If your backend expects gRPC, ensure it’s set to grpc (often the default) and the endpoint uses a gRPC port (e.g., 4317). If it expects HTTP, set protocol: http/protobuf and use an HTTP port (e.g., 4318).
      exporters:
        otlp:
          protocol: grpc # or http/protobuf
          endpoint: "your-backend.com:4317" # or :4318 for http
      
    • Why it works: Different protocols use different network ports and data serialization methods. Mismatched configurations lead to the collector sending data in a format the backend cannot understand or on a port it isn’t listening on.
  7. Resource Exhaustion on Collector Host: The collector process itself is running out of memory or file descriptors, preventing it from establishing new network connections.

    • Diagnosis: Monitor the collector process’s resource usage (top, htop, docker stats). Check system logs for "out of memory" (OOM) killer messages or "too many open files" errors.
    • Fix: Increase the RAM allocated to the collector host or container. Increase the open file descriptor limit (ulimit -n) for the user running the collector process. Review the collector’s configuration for excessive batch sizes or queue sizes that might be contributing to memory bloat.
    • Why it works: Network operations require system resources. When these are depleted, the operating system prevents new connections from being made, leading to exporter failures.

The next error you’ll likely encounter after fixing the permanent exporter failure is a "Queue Full" error or a "Batch Processor Timeout," indicating that while the exporter can now try to send data, the upstream processors or the sheer volume of data is overwhelming the collector’s capacity to handle it.

Want structured learning?

Take the full Opentelemetry course →