The most surprising thing about Saga Orchestration is that it’s actually less complex to reason about than the alternative, despite often being perceived as more complex.

Let’s watch a hypothetical order placement saga unfold. Imagine a user wants to buy a "Cosmic Widget" (SKU: CW-789) for $99.99.

  1. Client initiates CreateOrder: POST /orders with { "userId": "user-123", "items": [{"sku": "CW-789", "quantity": 1}], "total": 99.99 }
  2. Order Service (Orchestrator) receives: It doesn’t immediately create an order. Instead, it starts the saga.
    • It sends a command to the Inventory Service: POST /inventory/reserve with { "sku": "CW-789", "quantity": 1 }.
    • It sends a command to the Payment Service: POST /payment/authorize with { "userId": "user-123", "amount": 99.99 }.
    • It creates a pending order record in its own database, marked as STATE: PENDING_INVENTORY_RESERVATION.
  3. Inventory Service responds: 200 OK with { "reservationId": "res-abc" }.
  4. Payment Service responds: 200 OK with { "authorizationId": "auth-xyz" }.
  5. Order Service (Orchestrator) receives both successful responses:
    • It updates the order record: STATE: PENDING_ORDER_CREATION.
    • It sends a command to the Order Fulfillment Service: POST /fulfillment/create with { "orderId": "order-111", "reservationId": "res-abc", "deliveryAddress": "123 Main St" }. (The address would come from the user’s profile, fetched separately or included in the initial request).
  6. Order Fulfillment Service responds: 200 OK with { "fulfillmentId": "ful-def" }.
  7. Order Service (Orchestrator) receives success:
    • It updates the order record: STATE: COMPLETED.
    • It sends a command to the Inventory Service to commit the reservation: POST /inventory/commit with { "reservationId": "res-abc" }.
    • It sends a command to the Payment Service to capture the authorization: POST /payment/capture with { "authorizationId": "auth-xyz", "amount": 99.99 }.

The "Order Service" here is the Orchestrator. It’s the central coordinator. It tells other services what to do and in what order. If any step fails, the orchestrator is responsible for initiating compensating actions.

What problem does this solve?

This pattern solves the distributed transaction problem. In a microservices architecture, you can’t use a traditional two-phase commit (2PC) because services are independent and often don’t share a transactional boundary. If Service A needs Service B to do something, and Service B needs Service C, a failure in Service C leaves Service B in an inconsistent state, and Service A doesn’t even know what happened. Sagas provide a way to manage these multi-service operations with eventual consistency.

How does it work internally?

The orchestrator service maintains the state of the saga. It typically uses a state machine internally. When a saga starts, the orchestrator transitions to the first state and sends the corresponding command to the first participant service. As responses come back (either success or failure), the orchestrator transitions to the next state, sending the next command, or initiating compensation if a failure occurred.

  • State Management: The orchestrator needs a reliable place to store the saga’s current state. This could be a dedicated database table, a document store, or even an event log. Each entry in this store represents an instance of the saga, tracking its progress.
  • Command and Event Handling: The orchestrator listens for events from participant services (e.g., "InventoryReserved", "PaymentAuthorized") and sends commands to them (e.g., "ReserveInventory", "AuthorizePayment").
  • Compensation Logic: For every positive action a participant performs, there must be a corresponding compensating action. For example, if ReserveInventory can be compensated by ReleaseInventory, and AuthorizePayment by RefundPayment. The orchestrator’s failure handling logic invokes these compensating actions in reverse order of the original operations.

The exact levers you control:

  1. The Orchestrator’s State Machine: You define the states, the transitions between them, and the commands/events that trigger those transitions. This is the core logic.
  2. Participant Service Contracts: You define the commands each participant service accepts and the events it emits. These must be idempotent – calling them multiple times should have the same effect as calling them once.
  3. Compensation Logic: You explicitly define how to undo each step. This is crucial for maintaining data integrity.
  4. Timeouts and Retries: The orchestrator needs mechanisms to detect stalled participants and retry operations, or to give up and compensate after a certain period.

Consider this: when a participant service fails to respond, the orchestrator doesn’t just sit there. It has a timeout configured for that specific command. If the timeout expires, it treats the participant as having failed and initiates compensation for the steps that did succeed. For example, if AuthorizePayment times out after ReserveInventory succeeded, the orchestrator will first call ReleaseInventory to undo the reservation, and then it might retry AuthorizePayment or mark the order as failed.

The next concept you’ll run into is handling idempotency and ensuring that commands don’t get executed multiple times in the event of network retries or orchestrator restarts.

Want structured learning?

Take the full Saga-pattern course →