OpenTelemetry’s default log processing often struggles with multiline log entries, especially stack traces, treating each line as a separate event and losing the context of the full error.
This means when a Java application throws an exception, OpenTelemetry might ingest:
2023-10-27 10:00:00 ERROR com.example.MyClass - NullPointerExceptionat com.example.MyClass.myMethod(MyClass.java:42)at com.example.AnotherClass.process(AnotherClass.java:105)... 5 more
Instead of a single log entry containing the entire stack trace, which is crucial for debugging.
The core issue is that most log shippers (like the OpenTelemetry Collector, Fluentd, or Filebeat) are configured to split logs based on simple newline delimiters. Stack traces, by definition, span multiple lines, with subsequent lines typically starting with whitespace or specific keywords like "at" or "Caused by."
To fix this, you need to configure your log processing pipeline to recognize and reassemble these multiline log entries before they are sent to your backend (like Jaeger, Loki, or Splunk). This is typically done at the point of log collection, usually within the OpenTelemetry Collector or the agent gathering the logs.
Here’s how you can tackle this, focusing on the OpenTelemetry Collector as a central point for processing:
Cause 1: Default Line-Based Processing in the Collector
Diagnosis: Observe your raw logs within the OpenTelemetry Collector’s processing pipeline (e.g., by using a logging exporter in debug mode) or by inspecting the logs arriving at your backend. You’ll see individual lines of the stack trace as separate log records, each with its own timestamp and potentially missing parent context.
Fix: Implement a multiline processor within your OpenTelemetry Collector’s configuration. This processor allows you to define patterns to identify the start and end of multiline log entries.
Example otel-collector-config.yaml snippet:
processors:
multiline:
# For Java stack traces, subsequent lines usually start with "at" or whitespace.
# This pattern matches lines that *do not* start with "at" or whitespace as the start of a new log.
# The `first_line_pattern` is crucial for identifying the beginning of a multiline log.
# Adjust this regex based on your specific application's log format.
# For typical Java stack traces, lines starting with "at" or preceding lines with whitespace are continuations.
# This regex assumes a line NOT starting with "at " or whitespace is a new log entry.
# A more robust approach for Java is to match lines *starting* with "at " as continuations.
# Let's refine this to be more explicit for Java:
# Match lines that *start* with "at " or " at " as continuations.
# The `first_line_pattern` should capture the *actual* start of a log event.
# A common pattern is a timestamp or log level at the beginning.
# Let's assume log lines start with a date/time pattern.
# A more common approach for Java stack traces is to define what a *continuation* line looks like.
# The `split_pattern` defines what *starts* a new log entry.
# If a line *doesn't* match this pattern, it's considered a continuation.
# For many Java applications, the first line of a stack trace might be the log message itself,
# and subsequent lines start with "at" or indentation.
# A common strategy is to say: if a line *starts* with something that looks like a timestamp or log level, it's a new log.
# Otherwise, it's a continuation.
# Let's use a common pattern for Java logs: a timestamp followed by log level.
# If a line *doesn't* start with this, it's a continuation.
# Example: 2023-10-27 10:00:00.123 ERROR ...
# The pattern below matches lines that *do not* start with a date/time and log level pattern.
# This is a simplified example. You'll likely need to refine `split_pattern` based on your exact log format.
# A more direct approach for stack traces is to look for the *continuation* pattern.
# Let's redefine based on common Java stack trace structure:
# The *first* line of a log entry is usually the main message.
# Subsequent lines in a stack trace *start* with "at " or indentation followed by "at ".
# So, we want to say: if a line *starts* with "at " or " at ", it's a continuation.
# If it *doesn't*, it's a new log entry.
# The `split_pattern` defines the beginning of a *new* log entry.
# If a line *does not* match this pattern, it's appended to the previous log.
# For Java stack traces, the *first* line of the stack trace (the exception message) often *doesn't* start with "at ".
# Subsequent lines *do* start with "at ".
# So, we need a pattern that identifies the start of a *new* log entry, not a continuation.
# A common pattern for the start of a log entry is a timestamp, log level, or class name.
# Let's assume logs start with a timestamp like 'YYYY-MM-DD HH:MM:SS.ms'.
# If a line *doesn't* match this, it's a continuation.
# Alternative: Define what a continuation line looks like.
# The `first_line_pattern` should match the start of a log event.
# The `continuation_pattern` (implicitly, if first_line_pattern is not met) means append.
# A common pattern for Java stack traces is to identify lines starting with "at " as continuations.
# So, the `split_pattern` should match lines that are *not* continuations.
# Let's assume a line is a continuation if it starts with whitespace followed by "at ".
# Thus, a new log entry starts if it *doesn't* match that.
# Let's use the common Java stack trace pattern:
# We want to group lines that *start* with "at " or whitespace + "at ".
# The `split_pattern` should match lines that *begin* a new log entry.
# A new log entry typically *doesn't* start with "at ".
# So, we define the `split_pattern` to match lines that are *not* continuations.
# The most common continuation pattern is lines starting with whitespace followed by 'at'.
# Therefore, the split pattern should match lines that *don't* start this way.
# Let's try a common pattern for Java exceptions:
# A line is a *continuation* if it starts with whitespace and then "at ".
# Therefore, a line is a *new* log entry if it *doesn't* start with that.
# The `split_pattern` in the `multiline` processor defines what begins a *new* log entry.
# So, if a line *doesn't* match `split_pattern`, it's appended to the previous one.
# We want to append lines that start with "at " or indentation.
# Thus, the `split_pattern` should match lines that are *not* stack trace continuations.
# A robust pattern for Java stack traces is to identify lines that *do not* start with a typical log entry prefix (like timestamp, level)
# and *do* start with "at " or indentation.
# A simpler, common approach:
# `split_pattern`: This regex matches the start of a *new* log entry.
# If a line *doesn't* match `split_pattern`, it's appended to the previous log.
# For Java stack traces, the lines starting with "at " or " at " are continuations.
# The *first* line of the log message (e.g., "NullPointerException") typically does NOT start with "at ".
# So, we can define `split_pattern` to match lines that *don't* start with "at ".
# This is a simplification, and might incorrectly merge non-stack trace multiline logs.
# A more robust `split_pattern` for Java stack traces would be to match the beginning of a *standard* log line,
# e.g., a timestamp followed by a log level.
# Example: `^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} \w+`
# If a line *doesn't* match this, it's considered a continuation.
# For this example, let's use a pattern that identifies lines *not* starting with "at " or common indentation as new logs.
# This is a common heuristic for Java.
split_pattern: '^(?!\s*at\s)' # Matches lines that DO NOT start with optional whitespace followed by "at "
# The `timeout` is crucial. It defines how long the processor waits for a continuation line before
# considering the current log entry complete. A common value is 5 seconds.
timeout: 5s
# `max_lines` can be set to prevent unbounded memory growth if a malformed log entry is encountered.
max_lines: 1000
service:
pipelines:
logs:
receivers: [otlp] # or your log receiver
processors: [memory_limiter, batch, multiline] # order matters, multiline before exporters
exporters: [logging] # or your log exporter
Why it works: The multiline processor buffers incoming log lines. When it encounters a line that matches split_pattern (meaning it’s the start of a new log entry), it considers the previous buffered lines as a complete multiline log. If a line does not match split_pattern, it’s appended to the current buffered log entry. The timeout ensures that if a log entry is truly multiline but the application stops logging for a while, the buffered entry is flushed.
Cause 2: Incorrect split_pattern Regex
Diagnosis: Even with the multiline processor, you still see individual lines. This often means the split_pattern regex is not correctly identifying the start of new log entries or the continuation lines for your specific application’s log format.
Fix: Carefully examine your application’s log output for stack traces. Identify what uniquely marks the beginning of a log entry versus what marks a continuation.
- For Java, continuations often start with
atorat. - For Python, continuations might start with
File ". - For Node.js, it can vary but often involves indentation or specific keywords.
Adjust the split_pattern to accurately capture the start of a new log event. If your logs start with a timestamp like YYYY-MM-DD HH:MM:SS,ms LEVEL, your split_pattern should match that pattern. For example:
processors:
multiline:
split_pattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} \w+' # Matches typical Java timestamp prefix
timeout: 5s
max_lines: 1000
Why it works: A more precise split_pattern ensures that only lines truly beginning a new log event trigger the flushing of the previous buffered entry, while all lines belonging to the same logical log event (like a stack trace) are correctly accumulated.
Cause 3: timeout Too Short or Too Long
Diagnosis: You might see incomplete stack traces (some lines missing) or excessive buffering where logs are delayed significantly. If stack traces are being split unexpectedly, the timeout might be too short, causing the processor to flush the buffer before the entire stack trace has been received. If logs are delayed, the timeout might be too long, or there’s a network/backend issue.
Fix: Adjust the timeout value. For typical application logs, 5-10 seconds is usually sufficient. If you have applications that generate very long stack traces with significant pauses between lines, you might need to increase this.
processors:
multiline:
split_pattern: '^(?!\s*at\s)'
timeout: 10s # Increased timeout
max_lines: 1000
Why it works: The timeout acts as a safety net. It ensures that if the application generating logs stops producing output for a certain period, the buffered multiline log is considered complete and flushed. A correctly tuned timeout balances timely log delivery with allowing sufficient time for complete multiline entries.
Cause 4: Order of Processors
Diagnosis: The multiline processor might be placed incorrectly in the pipeline, leading to issues. For example, if a filter processor removes lines before multiline has a chance to process them, parts of your stack trace could be lost.
Fix: Ensure the multiline processor is placed before any processors that might filter or modify log lines based on their content, and generally before exporters. The memory_limiter and batch processors are often placed before multiline.
service:
pipelines:
logs:
receivers: [otlp]
processors: [memory_limiter, batch, multiline, filter] # multiline before filter
exporters: [logging]
Why it works: By processing multiline logic first, you ensure that the complete log event is reassembled before any subsequent filtering or batching occurs, preserving the integrity of the full stack trace.
Cause 5: Incorrectly Configured Log Shipper Agent (e.g., Fluentd, Filebeat)
Diagnosis: If you’re not using the OpenTelemetry Collector for log collection, but rather an agent like Fluentd or Filebeat, the multiline configuration will be within that agent’s configuration, not the Collector’s. The symptom is the same: individual lines appearing as separate logs.
Fix: Consult the documentation for your specific log shipper. For example, in Filebeat, you’d use the multiline input configuration.
Example Filebeat filebeat.yml snippet:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/myapp/*.log
multiline.pattern: '^\s+at .*' # Pattern for continuation lines (Java stack trace)
multiline.negate: true # If the pattern does NOT match, it's a new log
multiline.match: after # Append the current line to the previous one if it matches
multiline.timeout: 10s
Why it works: Similar to the Collector’s multiline processor, these agents use specific directives to identify and reassemble multiline log entries at the collection point, before sending them onwards.
Cause 6: Application Log Rotation/Interruption
Diagnosis: You might see partial stack traces, especially for long-running processes or during periods of high log volume, where log rotation happens mid-stack trace.
Fix: Ensure your log rotation strategy is configured to handle large log files gracefully and that your log shipper is robust enough to track file changes (e.g., using Filebeat’s clean_logs or similar mechanisms in other shippers). For the Collector, ensure the fsnotify receiver (if used for file monitoring) is correctly configured.
Why it works: Proper log rotation and file tracking ensure that the log shipper can follow log files even when they are renamed or recreated, preventing data loss or fragmentation of multiline logs.
Once you’ve correctly configured multiline processing, your stack traces will appear as single, coherent log entries in your backend, making debugging significantly easier.
The next thing you’ll likely run into is enriching these logs with contextual metadata like Kubernetes pod names, container IDs, or trace IDs.