The Saga pattern, when combined with CQRS, allows for distributed transactions that are both robust and observable.
Let’s see this in action with a simplified order processing flow. Imagine a customer placing an order. This triggers a CreateOrderCommand.
{
"commandType": "CreateOrderCommand",
"orderId": "ORD-12345",
"customerId": "CUST-987",
"items": [
{"productId": "PROD-A", "quantity": 2},
{"productId": "PROD-B", "quantity": 1}
],
"totalAmount": 75.50
}
This command is handled by the Order Write Service. It validates the order and, if successful, publishes an OrderCreatedEvent.
{
"eventType": "OrderCreatedEvent",
"orderId": "ORD-12345",
"customerId": "CUST-987",
"items": [
{"productId": "PROD-A", "quantity": 2},
{"productId": "PROD-B", "quantity": 1}
],
"totalAmount": 75.50,
"timestamp": "2023-10-27T10:00:00Z"
}
This OrderCreatedEvent is the trigger for our Saga. The Saga orchestrator (or choreography) listens for this event. Its first step is to reserve inventory. It sends a ReserveInventoryCommand to the Inventory Service.
{
"commandType": "ReserveInventoryCommand",
"orderId": "ORD-12345",
"items": [
{"productId": "PROD-A", "quantity": 2},
{"productId": "PROD-B", "quantity": 1}
]
}
If the Inventory Service successfully reserves the items, it publishes an InventoryReservedEvent.
{
"eventType": "InventoryReservedEvent",
"orderId": "ORD-12345",
"timestamp": "2023-10-27T10:01:00Z"
}
The Saga then proceeds to process payment. It sends a ProcessPaymentCommand to the Payment Service.
{
"commandType": "ProcessPaymentCommand",
"orderId": "ORD-12345",
"customerId": "CUST-987",
"amount": 75.50
}
If payment is successful, the Payment Service publishes a PaymentProcessedEvent.
{
"eventType": "PaymentProcessedEvent",
"orderId": "ORD-12345",
"transactionId": "TRN-XYZ789",
"timestamp": "2023-10-27T10:02:00Z"
}
At this point, the Saga considers the order complete. It publishes a final OrderConfirmedEvent.
{
"eventType": "OrderConfirmedEvent",
"orderId": "ORD-12345",
"timestamp": "2023-10-27T10:02:00Z"
}
The Order Write Service can listen for OrderConfirmedEvent to update its own state, and the Order Read Service can update its materialized views for querying.
What if something goes wrong? If the Inventory Service cannot reserve items, it publishes an InventoryReservationFailedEvent.
{
"eventType": "InventoryReservationFailedEvent",
"orderId": "ORD-12345",
"reason": "Insufficient stock for PROD-A",
"timestamp": "2023-10-27T10:01:30Z"
}
The Saga, upon receiving this, must compensate. It sends a CancelOrderCommand to the Order Write Service (to mark the order as cancelled) and a ReleaseInventoryCommand to the Inventory Service (to undo the partial reservation, if any).
{
"commandType": "CancelOrderCommand",
"orderId": "ORD-12345",
"reason": "Inventory reservation failed"
}
{
"commandType": "ReleaseInventoryCommand",
"orderId": "ORD-12345",
"items": [
{"productId": "PROD-A", "quantity": 2},
{"productId": "PROD-B", "quantity": 1}
]
}
The Saga pattern with CQRS provides a way to manage complex, multi-service workflows by breaking them into a series of independent commands and events. Each service is responsible for its own domain and exposes its state changes as events. The Saga then orchestrates or choreographs these events and commands, ensuring that either all steps complete successfully or compensating actions are taken. This makes your system resilient to transient failures and provides a clear audit trail of how each transaction progressed.
The true power of this pattern comes from the fact that the Saga doesn’t need to know about the internal workings of each service; it only needs to know the commands they accept and the events they publish. This loose coupling is fundamental to building scalable microservices. The CQRS aspect ensures that the write side (where commands are processed and events are generated) is decoupled from the read side (which consumes events to build optimized query models). This allows each side to scale and evolve independently.
The "command bus" and "event bus" are critical infrastructure. They are typically implemented as durable message queues (like Kafka, RabbitMQ, or Azure Service Bus) that guarantee message delivery and ordering within partitions. Without a reliable bus, the entire Saga coordination would be fragile. A common mistake is to use a simple in-memory queue for development and forget to replace it with a robust, distributed solution for production, leading to lost commands or events under load.
The next hurdle is handling idempotency across your command handlers and compensation logic.