Saga choreography can feel like a jazz improvisation where each musician reacts to the others, while orchestration is more like a conductor leading an orchestra.
Let’s watch choreography in action. Imagine a simple e-commerce order flow:
-
Order Service: Receives a new order.
POST /orderswith payload:{"customerId": 123, "items": [...], "totalAmount": 150.00}- Publishes an
OrderCreatedevent to a message broker (e.g., Kafka, RabbitMQ).
-
Inventory Service: Subscribes to
OrderCreatedevents.- Receives
OrderCreated. - Checks inventory for items. If available:
- Deducts items from stock.
- Publishes an
ItemsReservedevent.
- If inventory is insufficient:
- Publishes an
InventoryUnavailableevent.
- Publishes an
- Receives
-
Payment Service: Subscribes to
ItemsReservedevents.- Receives
ItemsReserved. - Processes payment. If successful:
- Publishes a
PaymentProcessedevent.
- Publishes a
- If payment fails:
- Publishes a
PaymentFailedevent.
- Publishes a
- Receives
-
Order Service (again): Subscribes to
PaymentProcessed,PaymentFailed, andInventoryUnavailableevents.- If
PaymentProcessedis received: Updates order status to "PAID". - If
PaymentFailedorInventoryUnavailableis received: Updates order status to "FAILED" and publishes anOrderFailedevent.
- If
-
Notification Service: Subscribes to
OrderFailedand events indicating success (e.g.,OrderPaid).- Sends an email/SMS to the customer.
In this choreography, each service listens for events from other services and reacts independently. There’s no central brain dictating the flow. The "logic" is distributed.
Now, consider orchestration. The same e-commerce order flow:
-
Order Service: Receives a new order.
POST /orderswith payload:{"customerId": 123, "items": [...], "totalAmount": 150.00}- Initiates a saga orchestration.
-
Orchestrator (e.g., a dedicated service or workflow engine like Camunda, Temporal):
- Receives the
StartOrderSagacommand from the Order Service. - Sends a
ReserveItemscommand to the Inventory Service. - Waits for a response from the Inventory Service (e.g.,
ItemsReservedorInventoryUnavailable). - If
ItemsReserved: Sends aProcessPaymentcommand to the Payment Service. - Waits for a response from the Payment Service (e.g.,
PaymentProcessedorPaymentFailed). - If
PaymentProcessed: Sends anUpdateOrderStatuscommand to the Order Service (with status "PAID") and aSendOrderConfirmationcommand to the Notification Service. - If
InventoryUnavailableorPaymentFailed: Sends anUpdateOrderStatuscommand to the Order Service (with status "FAILED") and aSendOrderFailureNotificationcommand to the Notification Service.
- Receives the
In orchestration, the orchestrator explicitly tells each participant service what to do and when. It maintains the state of the saga.
The core problem both patterns solve is managing distributed transactions across multiple services without relying on traditional two-phase commit (2PC), which doesn’t scale in microservices. They allow services to remain independent while still ensuring eventual consistency for complex business processes. The "saga" is the sequence of local transactions, where each local transaction updates the state and triggers the next step. If a step fails, compensating transactions are executed to undo previous steps.
The most surprising thing about saga choreography is how quickly it devolves into spaghetti code if you’re not extremely disciplined about event naming and domain boundaries. Because every service is reacting to every other service, a small change in one service’s event can ripple through many others, and debugging becomes a nightmare of tracing event chains across logs. You end up with implicit dependencies that are hard to visualize and manage.
The key levers you control in choreography are the events themselves: their names, their schemas, and the topics they are published to. In orchestration, you control the workflow definition (e.g., a BPMN diagram or a code-based workflow) and the commands sent between services.
The next concept to explore is how to handle compensating transactions effectively in both patterns, especially when a service that needs to compensate has already failed.