A Saga orchestrates a sequence of local transactions, each updating the system’s state and triggering the next step in the process, but unlike a traditional ACID transaction, it doesn’t lock resources for the entire duration, allowing for much longer-running operations.

Let’s see a Saga in action, coordinating a multi-stage order fulfillment process. Imagine a customer places an order. This triggers a Saga that might look like this:

  1. CreateOrder: A new order is created in the orders service. This is a local ACID transaction.
  2. ReserveInventory: The inventory service is called to reserve the items. This is another local ACID transaction. If successful, it signals the Saga to proceed.
  3. ProcessPayment: The payment service attempts to charge the customer. Again, a local ACID transaction.
  4. ShipOrder: If payment succeeds, the shipping service is invoked to arrange delivery.

Each step is a self-contained transaction. If any step fails, the Saga doesn’t simply roll back like a traditional transaction. Instead, it executes compensating transactions in reverse order.

  • If ProcessPayment fails, the Saga calls ReleaseInventory (compensating ReserveInventory) and then CancelOrder (compensating CreateOrder).
  • If ReserveInventory fails initially, the Saga might just call CancelOrder directly.

The key is that each service only commits its own local transaction. The Saga coordination layer (often an orchestrator or a choreography of events) manages the overall flow and triggers compensation when needed. This allows the entire process to span hours, days, or even weeks without holding locks that would cripple other parts of the system.

Here’s a simplified view of the state transitions in an orchestrator-based Saga:

  • Initial State: PENDING
  • Step 1: CreateOrder -> ORDER_CREATED
  • Step 2: ReserveInventory -> INVENTORY_RESERVED
  • Step 3: ProcessPayment -> PAYMENT_PROCESSED
  • Step 4: ShipOrder -> ORDER_FULFILLED

Failure Paths:

  • If ReserveInventory fails: INVENTORY_RESERVATION_FAILED -> Compensate CreateOrder -> ORDER_CANCELLED
  • If ProcessPayment fails: PAYMENT_PROCESSING_FAILED -> Compensate ReserveInventory -> INVENTORY_RELEASED -> Compensate CreateOrder -> ORDER_CANCELLED
  • If ShipOrder fails: SHIPPING_FAILED -> Compensate ProcessPayment -> PAYMENT_REFUNDED -> Compensate ReserveInventory -> INVENTORY_RELEASED -> Compensate CreateOrder -> ORDER_CANCELLED

The problem Sagas solve is achieving atomicity across distributed services where traditional two-phase commit (2PC) is impractical due to performance, availability, or complexity limitations. Instead of a global lock, it uses a sequence of local ACID transactions with explicit compensation logic for rollback. This makes systems more resilient and scalable, especially for business processes that inherently take time.

The complexity of implementing and debugging compensating actions is often underestimated. It’s not enough to just reverse the effect of an operation; you need to ensure that the compensation itself is idempotent and handles partial failures gracefully. For example, if a payment was already refunded due to a prior failure, attempting to refund it again should not cause an error.

The next challenge is handling the eventual consistency implications of these long-running, distributed processes.

Want structured learning?

Take the full Saga-pattern course →