A saga is a sequence of local transactions, where each transaction updates data within a single service and publishes a message or event to trigger the next transaction in the saga. If any local transaction fails, the saga executes a series of compensating transactions to undo the preceding local transactions.
Let’s see a simple saga in action for an e-commerce order placement.
{
"orderId": "ORD12345",
"customerId": "CUST67890",
"items": [
{"productId": "PROD001", "quantity": 2},
{"productId": "PROD002", "quantity": 1}
],
"totalAmount": 150.75
}
-
Order Service: Receives the order.
- Local Transaction: Creates an
Orderrecord in its database withstatus: PENDING. - Event Published:
OrderCreated(containing order details).
- Local Transaction: Creates an
-
Inventory Service: Listens for
OrderCreated.- Local Transaction: Reserves items. Decrements
quantityforPROD001andPROD002. If successful, publishesItemsReserved. - If
ItemsReservedcannot be published (e.g., insufficient stock), it publishesItemReservationFailed.
- Local Transaction: Reserves items. Decrements
-
Payment Service: Listens for
ItemsReserved.- Local Transaction: Charges the customer. Creates a
Paymentrecord withstatus: PAID. If successful, publishesPaymentProcessed. - If charging fails, it publishes
PaymentFailed.
- Local Transaction: Charges the customer. Creates a
-
Order Service (again): Listens for
PaymentProcessedorPaymentFailed.- If
PaymentProcessed: UpdatesOrderstatus toAPPROVED. - If
PaymentFailed: UpdatesOrderstatus toFAILED.
- If
Now, what happens if the PaymentService fails?
- The
PaymentServicepublishesPaymentFailed. - The
OrderServicelistens forPaymentFailedand updates the order status toFAILED. - But we’re not done. The
ItemsReservedevent is still in the message queue, and theInventoryServicehas reserved items. We need to compensate.
To compensate for the ItemsReserved event, the OrderService (or a dedicated saga orchestrator) would publish a CancelOrder event.
- Inventory Service (compensating): Listens for
CancelOrder.- Compensating Transaction: Releases the reserved items. Increments
quantityforPROD001andPROD002. PublishesItemsReleased.
- Compensating Transaction: Releases the reserved items. Increments
This is where the "compensate without distributed locks" comes in. If the InventoryService tried to directly undo the ItemsReserved transaction by, say, calling a ReleaseItems API on the OrderService while the OrderService was still processing PaymentFailed, you could have race conditions.
The core problem is that a compensating transaction is not simply the inverse of a local transaction. A local transaction might succeed, but its effect (e.g., an item reservation) might need to be undone later due to a failure downstream. The compensation logic needs to be idempotent and handle the state of the system at the time it’s invoked.
Consider the OrderService receiving PaymentFailed. It needs to trigger compensation. It publishes CancelOrder. The InventoryService receives CancelOrder. Its local transaction was to reserve items. The compensating action is to release those items. If the InventoryService has already processed ItemsReserved and the compensation logic is triggered, it should simply execute the release.
What if the InventoryService receives CancelOrder before it successfully processes ItemsReserved? This is a critical point. The compensation must be robust. If the InventoryService receives CancelOrder when its internal state is still "awaiting reservation confirmation" (or similar), it should ideally ignore the CancelOrder or mark the reservation as "cancelled" without actually decrementing stock. The key is that the final state must reflect that the items were never truly committed to the order, even if the reservation technically happened.
A common pattern is to use the event itself as the trigger for compensation, and the compensating action is another event. The OrderService doesn’t directly tell InventoryService to "undo reservation." Instead, it says, "This order is cancelled" (CancelOrder event). The InventoryService then applies its compensation logic based on that event, which is to release any items it might have reserved for that order. The idempotency of the ItemsReserved and ItemsReleased operations is paramount here. If ItemsReleased is called twice for the same reservation, it should have no further effect after the first call. This is achieved by tracking the state of the reservation (e.g., reserved, released, cancelled).
The most surprising true thing about saga compensation is that the compensating transaction often doesn’t "undo" the previous transaction in a direct, atomic rollback sense. Instead, it performs a separate business operation that reverses the effect of the prior operation, and this reversal must be designed to be independent and potentially handle cases where the original operation may not have fully completed or committed.
The next concept you’ll likely encounter is how to manage the state of the saga itself when failures occur mid-compensation.