Saga choreography is a way to manage distributed transactions where each participant in a business process makes a decision based on receiving an event, rather than being told what to do by a central orchestrator.
Let’s see this in action. Imagine a simple online order process:
- Order Service: Receives a new order request. It validates the order, creates an
OrderCreatedevent, and publishes it.{ "eventType": "OrderCreated", "orderId": "ORD123", "customerId": "CUST456", "amount": 100.50 } - Payment Service: Subscribes to
OrderCreatedevents. Upon receiving one, it attempts to process the payment. If successful, it publishes aPaymentProcessedevent. If it fails, it publishes aPaymentFailedevent.{ "eventType": "PaymentProcessed", "orderId": "ORD123", "paymentId": "PAY789" } - Inventory Service: Also subscribes to
OrderCreatedevents. It reserves the items for the order. If successful, it publishes anInventoryReservedevent. If it fails (e.g., out of stock), it publishes anInventoryReservationFailedevent.{ "eventType": "InventoryReserved", "orderId": "ORD123", "inventoryId": "INV001" } - Order Service (again): Subscribes to
PaymentProcessed,PaymentFailed,InventoryReserved, andInventoryReservationFailedevents.- If it receives
PaymentProcessedandInventoryReserved, it marks the order asCompleted. - If it receives
PaymentFailedorInventoryReservationFailed, it initiates a rollback. For example, ifPaymentFailedarrives, it publishes anOrderCancelledevent.
{ "eventType": "OrderCancelled", "orderId": "ORD123", "reason": "Payment failed" } - If it receives
- Payment Service (rollback): Subscribes to
OrderCancelledevents. If it receives one, it refunds the payment if it was already processed.{ "eventType": "PaymentRefunded", "orderId": "ORD123", "paymentId": "PAY789" } - Inventory Service (rollback): Subscribes to
OrderCancelledevents. If it receives one, it releases the reserved inventory.{ "eventType": "InventoryReleased", "orderId": "ORD123", "inventoryId": "INV001" }
This creates a chain reaction: an event triggers an action, which generates another event, triggering further actions. Each service is autonomous and only needs to know about the events it cares about.
The core problem saga choreography solves is maintaining data consistency across multiple independent microservices without introducing a single point of failure or tight coupling. In traditional ACID transactions, a coordinator ensures all parts of the transaction succeed or fail together. In distributed systems, this is often impractical or impossible. Choreography distributes this responsibility. Each service acts like a dancer in a choreographed performance, reacting to cues (events) from others. When something goes wrong, a compensating event is published to undo previous steps.
The key levers you control are the events themselves and the logic within each service to react to them. You define the "happy path" and the "unhappy paths" by what events you publish and what events you subscribe to. For example, the Order Service decides the ultimate state of the order based on the combination of PaymentProcessed and InventoryReserved events. The Payment Service decides whether to publish PaymentProcessed or PaymentFailed based on its internal state and external checks.
What most people don’t realize is that the "state" of a distributed transaction in a choreographed saga often resides implicitly in the sequence and combination of events that have been processed. There isn’t a single "transaction manager" holding the global state; rather, each service maintains its local state, and the overall transaction state is inferred by observing the event stream. This makes debugging and auditing harder, as you need to reconstruct the sequence of events to understand what happened.
The next challenge is managing dead-letter queues and ensuring event idempotency.