A saga isn’t a distributed transaction; it’s a sequence of local transactions where each transaction updates data and publishes an event or message to trigger the next.

Let’s watch a simple e-commerce order placement saga unfold. Imagine a customer wants to buy a book.

  1. Create Order: The OrderService creates an order in a PENDING state. It then publishes an OrderCreated event.
    {
      "eventType": "OrderCreated",
      "orderId": "ORD123",
      "customerId": "CUST456",
      "bookId": "BOOK789",
      "quantity": 1
    }
    
  2. Reserve Inventory: The InventoryService listens for OrderCreated events. Upon receiving it, it checks if BOOK789 has sufficient stock. If yes, it decrements the stock count and publishes an InventoryReserved event. If no, it publishes an InventoryReservationFailed event.
    {
      "eventType": "InventoryReserved",
      "orderId": "ORD123",
      "bookId": "BOOK789",
      "reservedQuantity": 1
    }
    
  3. Process Payment: The PaymentService listens for InventoryReserved events. It then attempts to charge CUST456’s account for the book’s price. If successful, it publishes a PaymentProcessed event. If it fails (e.g., insufficient funds), it publishes a PaymentProcessingFailed event.
    {
      "eventType": "PaymentProcessed",
      "orderId": "ORD123",
      "customerId": "CUST456",
      "amount": 25.99
    }
    
  4. Finalize Order: The OrderService listens for PaymentProcessed events. Upon receiving it, it updates the order status to APPROVED. If it receives a PaymentProcessingFailed event, it updates the order status to FAILED.

What happens if InventoryReservation fails? The InventoryService publishes InventoryReservationFailed. The OrderService listens for this, and if it receives it, it updates the order status to FAILED. This is a compensation step. If payment processing fails, the PaymentService publishes PaymentProcessingFailed. The OrderService sees this and updates the order to FAILED. But we also need to undo the inventory reservation. So, the OrderService would also publish a PaymentFailedCompensation event. The InventoryService would listen for this and add back the reserved stock.

This entire sequence, with its forward execution and potential compensation steps, is a saga. It solves the problem of maintaining data consistency across independent services without the tight coupling and blocking nature of traditional distributed transactions.

The core problem sagas solve is maintaining eventual consistency in a distributed system where multiple services own their data. Unlike ACID transactions, sagas don’t guarantee atomicity (all or nothing) in the same way. Instead, they ensure that the system eventually reaches a consistent state through a series of compensating actions if any step fails. The "trade-off" is sacrificing immediate consistency for availability and scalability.

The exact levers you control are the events published by each service and the logic within each service to handle those events, including the compensation logic. For instance, in the payment step, you might have a PaymentRetryPolicy that dictates how many times the PaymentService will attempt to charge the customer before publishing PaymentProcessingFailed.

When implementing a saga, the choreography-based approach (where each service publishes events and others react) is often simpler to start with, but can become hard to track. The alternative, orchestration-based approach (where a central orchestrator service dictates the flow and calls each service directly, managing compensation logic), offers more control but introduces a single point of failure and a more complex orchestrator.

The most surprising thing is that a failed step in a saga doesn’t necessarily mean the entire business transaction fails permanently. It means the forward progress stops, and compensation kicks in to roll back only the completed steps. This allows other parts of the system to continue operating, which is a massive win for availability.

The next concept you’ll run into is idempotency, which is critical for reliably handling retries and ensuring events aren’t processed multiple times.

Want structured learning?

Take the full Saga-pattern course →