A saga is more than just a sequence of operations; it’s a distributed transaction that guarantees eventual consistency across multiple services.

Let’s see a simple saga in action, managing an order placement process. We have three services: OrderService, PaymentService, and InventoryService.

// Request to OrderService
POST /orders
{
  "orderId": "ORD123",
  "customerId": "CUST456",
  "items": [
    {"productId": "PROD789", "quantity": 2}
  ],
  "totalAmount": 100.00
}

// OrderService initiates saga, calls PaymentService
POST /payments/authorize
{
  "orderId": "ORD123",
  "customerId": "CUST456",
  "amount": 100.00
}

// PaymentService authorizes, calls InventoryService
POST /inventory/reserve
{
  "orderId": "ORD123",
  "items": [
    {"productId": "PROD789", "quantity": 2}
  ]
}

// InventoryService reserves, responds to OrderService
// OrderService confirms payment and inventory, completes order
// OrderService responds to client
201 Created
{
  "orderId": "ORD123",
  "status": "COMPLETED"
}

If PaymentService fails, OrderService needs to compensate. It would then call a compensation endpoint on PaymentService (e.g., POST /payments/void) to reverse the authorization. Similarly, if InventoryService fails, OrderService would call POST /payments/refund on PaymentService and then POST /inventory/release on InventoryService.

The core problem sagas solve is maintaining data integrity in microservice architectures without the overhead and limitations of traditional ACID transactions across service boundaries. Each service owns its data, and the saga orchestrates a series of local transactions, with defined compensation actions for each step. This allows for high availability and scalability while still providing a guarantee that the overall business transaction will either succeed or be reliably rolled back.

The two main patterns for implementing sagas are:

  1. Choreography: Services communicate directly with each other via events. When a service completes its local transaction, it publishes an event, and other services interested in that event react and perform their own local transactions. This leads to a decentralized system where no single service orchestrates the entire flow.

    • Example: OrderService completes its initial step, publishes OrderCreated event. PaymentService listens for OrderCreated, processes payment, and publishes PaymentAuthorized event. InventoryService listens for PaymentAuthorized, reserves stock, and publishes StockReserved event. OrderService listens for StockReserved and marks the order as complete.
  2. Orchestration: A central orchestrator service manages the saga flow. The orchestrator sends commands to each participating service and listens for replies. If a step fails, the orchestrator is responsible for invoking the compensation actions on preceding services.

    • Example: OrderService acts as the orchestrator. It sends AuthorizePayment command to PaymentService. Upon receiving PaymentAuthorized reply, it sends ReserveInventory command to InventoryService. If ReserveInventory fails, the orchestrator sends RefundPayment command to PaymentService and then ReleaseInventory command to InventoryService.

When testing sagas end-to-end, assertions go beyond simple state checks. You need to verify not only the final desired state but also the intermediate states and the successful execution of compensation actions.

For a successful order:

  • Assertion 1: The final Order status in OrderService is COMPLETED.
  • Assertion 2: PaymentService has a record of a successful authorization for the order amount.
  • Assertion 3: InventoryService has a record of the items being reserved for the order.

For a failed order (e.g., inventory unavailable after payment authorization):

  • Assertion 1: The final Order status in OrderService is FAILED or CANCELLED.
  • Assertion 2: PaymentService has a record of an authorization that was subsequently refunded or voided.
  • Assertion 3: InventoryService has a record of the items not being reserved (or the reservation being released if it happened before the failure).
  • Assertion 4: No compensation actions were erroneously triggered for steps that had already succeeded.

A common pitfall is relying solely on the final state. You must also test the failure paths rigorously. This often involves mocking or stubbing downstream services to simulate failures at various stages of the saga. For instance, to test the compensation for PaymentService failure, you’d simulate PaymentService returning an error response to OrderService’s authorization request. Then, you’d assert that OrderService correctly calls PaymentService’s compensation endpoint.

The true complexity in sagas often lies not in the happy path, but in ensuring that compensation logic correctly handles partial successes and the potential for retries or idempotency issues in distributed systems. If a compensation action itself fails, the system enters a much more complex error handling state, often requiring manual intervention or a dedicated "dead-letter" queue for failed compensation steps.

The next challenge is managing the complexity of long-running sagas and their state persistence.

Want structured learning?

Take the full Saga-pattern course →