Python’s logging module is notoriously difficult to use effectively in production, often leading to unstructured, unsearchable log files.

Let’s see structured logging in action. Imagine a web server request. We want to log the request ID, user ID, duration, and status code in a way that a machine can easily parse.

import logging
import json
from logging.handlers import RotatingFileHandler
import uuid
import time

# Configure a logger
logger = logging.getLogger('my_app')
logger.setLevel(logging.INFO)

# Create a formatter for JSON output
class JsonFormatter(logging.Formatter):
    def format(self, record):
        log_entry = {
            "timestamp": self.formatTime(record, self.datefmt),
            "level": record.levelname,
            "message": record.getMessage(),
            "logger_name": record.name,
            "pathname": record.pathname,
            "lineno": record.lineno,
        }
        # Add any extra attributes passed to the log call
        if record.args:
            log_entry.update(record.args)
        return json.dumps(log_entry)

# Use RotatingFileHandler to manage log file size
log_file = 'app.log'
handler = RotatingFileHandler(log_file, maxBytes=10*1024*1024, backupCount=5) # 10MB per file, 5 backups

# Instantiate the JSON formatter
formatter = JsonFormatter()
handler.setFormatter(formatter)

# Add the handler to the logger
if not logger.handlers: # Prevent adding multiple handlers if script is reloaded
    logger.addHandler(handler)

# Simulate a web request
def handle_request(request_id, user_id):
    start_time = time.time()
    try:
        # Simulate some work
        time.sleep(0.1)
        status_code = 200
        result = "Success"
        logger.info(
            "Request processed successfully",
            extra={"request_id": request_id, "user_id": user_id, "status_code": status_code}
        )
    except Exception as e:
        status_code = 500
        result = str(e)
        logger.error(
            "Request processing failed",
            extra={"request_id": request_id, "user_id": user_id, "status_code": status_code, "error": result}
        )
    finally:
        duration = time.time() - start_time
        logger.info(
            "Request completed",
            extra={"request_id": request_id, "user_id": user_id, "status_code": status_code, "duration_ms": round(duration * 1000, 2)}
        )

# Example usage
request_id_1 = str(uuid.uuid4())
user_id_1 = "user_123"
handle_request(request_id_1, user_id_1)

request_id_2 = str(uuid.uuid4())
user_id_2 = "user_456"
handle_request(request_id_2, user_id_2)

The core problem this solves is the inability of traditional log aggregators (like Splunk, Elasticsearch, Datadog) to efficiently search, filter, and alert on plain text logs. When every log line is a JSON object with well-defined keys, you can ask questions like "Show me all requests for user_id: 'user_123' that took longer than 500ms" or "Alert me if status_code is 500 and request_id is present." The logging module’s default formatter produces simple strings, making this kind of analysis impossible without complex regex or custom parsing.

Internally, the logging module is a hierarchical system. Loggers are organized like a tree (root, my_app, my_app.web, etc.). When you call logger.info(...), the message is passed up the hierarchy to all parent loggers. Each logger can have zero or more handlers attached, and each handler can have a formatter. Handlers are responsible for sending log records to their destination (console, file, network). Formatters decide what the log record looks like. By creating a custom JsonFormatter and attaching it to a RotatingFileHandler, we ensure that all messages processed by our my_app logger are formatted as JSON and written to a file that automatically rolls over. The extra dictionary in the logger.info and logger.error calls is crucial; it allows us to pass arbitrary contextual data that our JsonFormatter then incorporates into the JSON output.

The most surprising truth about Python’s logging module is that its extra parameter is not just for adding arbitrary fields. It’s the primary mechanism for passing structured context, and the Formatter subclass is how you consume that context. Most developers use extra by mistake or only for a few fields, but it’s designed to be the bridge between your application’s runtime state and your log analysis pipeline. If you’re not using extra with a custom formatter, you’re likely not getting the full benefit of structured logging.

The next step is often centralizing these logs from multiple application instances into a single aggregation system.

Want structured learning?

Take the full Python course →