The OpenTelemetry File Log Receiver is a surprisingly powerful tool that can make your log collection infrastructure far more robust by decoupling log generation from log transmission.

Let’s see it in action. Imagine you have an application writing logs to /var/log/myapp/app.log. Normally, you’d have a separate agent like Fluentd or Logstash tailing this file and shipping it. The File Log Receiver lets the OpenTelemetry Collector itself do this.

Here’s a minimal configuration for the Collector to read from that file:

receivers:
  filelog:
    include:
      - /var/log/myapp/app.log

exporters:
  logging:
    loglevel: debug

service:
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [logging]

When you run the Collector with this configuration and your application writes to /var/log/myapp/app.log, you’ll see output like this in the Collector’s logs (assuming you’re using the logging exporter for simplicity):

2023-10-27T10:00:00.123Z info       Received log: {"body": "Application started successfully.", "severity": "INFO", "timestamp": "2023-10-27T10:00:00.000Z"}
2023-10-27T10:00:05.456Z info       Received log: {"body": "User 'admin' logged in.", "severity": "INFO", "timestamp": "2023-10-27T10:00:05.000Z"}

This demonstrates the core functionality: the filelog receiver tails the specified file, parses each line as a log entry, and sends it through the logs pipeline.

The File Log Receiver is designed to handle a variety of log formats. By default, it assumes plain text logs, where each line is a separate log entry. However, it excels when you configure it to understand structured logs. For example, if your application writes JSON logs, you can specify that:

receivers:
  filelog:
    include:
      - /var/log/myapp/app.log
    parse_by_regex:
      regex: '^(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z)\s+(?P<severity>\w+)\s+(?P<body>.*)$'
      timestamp:
        key: timestamp
        layout: '2006-01-02T15:04:05.000Z'
      severity:
        key: severity
        mapping:
          INFO: 1
          WARN: 2
          ERROR: 3
          DEBUG: 4
          FATAL: 5
          PANIC: 6
      body:
        key: body

In this enhanced configuration, we’re using parse_by_regex to extract fields. The regex defines capture groups for timestamp, severity, and body. We then explicitly tell the receiver which keys correspond to the timestamp and severity fields, and how to parse the layout of the timestamp. The mapping for severity is crucial for OpenTelemetry to understand log levels correctly. The body is simply mapped to the remaining text.

If your application produces JSON logs, the configuration becomes even simpler, leveraging the json_fields option:

receivers:
  filelog:
    include:
      - /var/log/myapp/app.jsonl # Assuming JSON Lines format
    json_fields:
      timestamp:
        key: ts
        layout: '2006-01-02T15:04:05.000Z'
      severity:
        key: level
        mapping:
          info: 1
          warn: 2
          error: 3
          debug: 4
          fatal: 5
          panic: 6
      body:
        key: msg

Here, we assume logs are in JSON Lines format (one JSON object per line). json_fields directly maps fields within the JSON to OpenTelemetry log attributes. The key specifies the field name in the JSON, and layout and mapping work similarly to the regex example.

The File Log Receiver also offers advanced features for managing large log files and preventing data loss. The start_at option allows you to specify where to begin reading a file. For instance, start_at: beginning will start from the very start of the file, while start_at: end (the default) will begin from the current end of the file, effectively only processing new logs. This is useful for initial setup or when you need to re-process historical logs.

Furthermore, to handle log rotation gracefully, you can use the poll_interval to control how often the receiver checks for new or rotated files. Combined with include and exclude patterns, you can manage multiple log files dynamically. For example, to collect all .log files in /var/log/myapp/ except for those ending in .gz:

receivers:
  filelog:
    include:
      - /var/log/myapp/*.log
    exclude:
      - /var/log/myapp/*.log.gz
    poll_interval: 5s
    start_at: end

The poll_interval of 5s means the receiver will check for new log entries or rotated files every five seconds. This ensures that even if a log file is rotated (e.g., app.log becomes app.log.1 and a new app.log is created), the receiver will pick up the new file. The start_at: end ensures that when a new file is created, we start reading from its current end.

A critical, often overlooked, aspect of the File Log Receiver is its reliance on file system events and polling. When a log file is rotated, the receiver detects this by noticing that the file it was previously tailing has been replaced by a new inode. It then gracefully closes the old file handle and opens the new one, continuing to tail from where it left off in the new file. This mechanism is what prevents duplicate log entries or dropped logs during rotation.

The next logical step in log processing after collecting them from files is to enrich them with metadata or route them to specific destinations based on their content.

Want structured learning?

Take the full Opentelemetry course →