The Saga pattern is a way to manage data consistency across microservices without resorting to the two-phase commit (2PC) protocol, which often becomes a bottleneck in distributed systems.

Let’s see the Saga pattern in action. Imagine an e-commerce system where placing an order involves several independent services: OrderService, PaymentService, and InventoryService.

Here’s a simplified flow:

  1. Client Request: A user requests to place an order.
  2. OrderService: Creates an Order with status PENDING. It then initiates the saga by sending a command to PaymentService to process payment.
  3. PaymentService: Processes the payment. If successful, it sends a PaymentSuccess event. If it fails, it sends a PaymentFailed event.
  4. InventoryService: Upon receiving PaymentSuccess, it reserves the items in the inventory. If successful, it sends an InventoryReserved event. If it fails (e.g., out of stock), it sends an InventoryFailed event.
  5. OrderService:
    • If InventoryReserved is received, it updates the Order status to APPROVED and sends a ShipmentRequest to a ShippingService.
    • If PaymentFailed or InventoryFailed is received, it initiates the compensating transaction.

A compensating transaction is the "undo" operation for a step that has already completed.

  • If PaymentFailed occurs after the order was created but before payment, the OrderService needs to mark the order as FAILED. No compensation needed for payment itself if it didn’t happen.
  • If InventoryFailed occurs after payment was successful, the OrderService needs to mark the order as FAILED and, crucially, trigger a compensation in PaymentService to refund the payment.

This creates a sequence of local transactions. If any local transaction fails, the saga executes a series of compensating transactions to undo the work of preceding transactions.

Here’s how you might model this with an event-driven approach.

Order Service (OrderService)

# OrderService Configuration (Conceptual)
saga:
  name: PlaceOrderSaga
  steps:
    - name: CreateOrder
      command: CreateOrderCommand
      event: OrderCreatedEvent
      compensating_command: CancelOrderCommand # Optional, if order creation itself can fail critically

    - name: ProcessPayment
      command: ProcessPaymentCommand
      event: PaymentProcessedEvent
      compensating_command: RefundPaymentCommand
      depends_on: OrderCreatedEvent

    - name: ReserveInventory
      command: ReserveInventoryCommand
      event: InventoryReservedEvent
      compensating_command: ReleaseInventoryCommand
      depends_on: PaymentProcessedEvent

    - name: ApproveOrder
      event: OrderApprovedEvent
      depends_on: InventoryReservedEvent

Payment Service (PaymentService)

# PaymentService Logic (Conceptual)
on(ProcessPaymentCommand):
  try:
    # ... perform payment processing ...
    if payment_successful:
      publish(PaymentProcessedEvent(order_id=command.order_id, amount=command.amount))
    else:
      publish(PaymentFailedEvent(order_id=command.order_id, reason="Insufficient funds"))
  except Exception as e:
    publish(PaymentFailedEvent(order_id=command.order_id, reason=str(e)))

on(RefundPaymentCommand):
  try:
    # ... perform refund processing ...
    publish(PaymentRefundedEvent(order_id=command.order_id))
  except Exception as e:
    # Log error, potentially retry or alert
    publish(RefundFailedEvent(order_id=command.order_id, reason=str(e)))

Inventory Service (InventoryService)

# InventoryService Logic (Conceptual)
on(ReserveInventoryCommand):
  try:
    # ... attempt to reserve items ...
    if reservation_successful:
      publish(InventoryReservedEvent(order_id=command.order_id, items=command.items))
    else:
      publish(InventoryFailedEvent(order_id=command.order_id, reason="Item out of stock"))
  except Exception as e:
    publish(InventoryFailedEvent(order_id=command.order_id, reason=str(e)))

on(ReleaseInventoryCommand):
  try:
    # ... release reserved items ...
    publish(InventoryReleasedEvent(order_id=command.order_id))
  except Exception as e:
    # Log error, potentially retry or alert
    publish(ReleaseFailedEvent(order_id=command.order_id, reason=str(e)))

The core idea is that each service performs a local transaction. If a service fails, it doesn’t roll back the entire distributed system; instead, it signals failure, and the orchestrator (or the services themselves, in a choreography-based saga) triggers compensating actions for the steps that did succeed.

A common pitfall is misunderstanding compensation. A compensating transaction must be idempotent and guaranteed to succeed. If a refund fails, you have a distributed transaction that partially succeeded and failed to compensate, leading to data inconsistency. This is why robust error handling, retry mechanisms, and potentially manual intervention processes are critical for compensating actions.

The next challenge you’ll face is handling the failure of a compensating transaction itself, which often requires a separate, perhaps manual, resolution process.

Want structured learning?

Take the full Saga-pattern course →