Sagas are a fundamental pattern for managing distributed transactions, but their implementation often feels like trying to conduct a symphony with a broken baton.
Let’s see how Kafka, with its robust event streaming capabilities, can transform this chaotic endeavor into a surprisingly elegant orchestration.
Imagine a simple e-commerce order processing flow. We need to:
- Create an Order.
- Reserve Inventory.
- Process Payment.
- Ship the Order.
Each of these steps is a separate microservice. If any step fails, we need to compensate previous successful steps to maintain data consistency.
Here’s a simplified Kafka topic structure for our saga:
orders.events
inventory.events
payments.events
shipping.events
When an order is created, the OrderService publishes an OrderCreated event to orders.events.
// orders.events
{
"eventType": "OrderCreated",
"orderId": "ORD123",
"customerId": "CUST456",
"items": [
{"productId": "PROD789", "quantity": 2}
],
"timestamp": "2023-10-27T10:00:00Z"
}
The InventoryService consumes OrderCreated events. If inventory is available, it publishes an InventoryReserved event to inventory.events.
// inventory.events
{
"eventType": "InventoryReserved",
"orderId": "ORD123",
"customerId": "CUST456",
"productId": "PROD789",
"quantityReserved": 2,
"timestamp": "2023-10-27T10:01:00Z"
}
If inventory is not available, it publishes an InventoryReservationFailed event, triggering the compensation flow for the OrderCreated step (which in this simple case, might just be marking the order as failed).
The PaymentService consumes InventoryReserved events. Upon successful payment, it publishes a PaymentProcessed event to payments.events.
// payments.events
{
"eventType": "PaymentProcessed",
"orderId": "ORD123",
"customerId": "CUST456",
"amount": 50.00,
"transactionId": "TXNABC",
"timestamp": "2023-10-27T10:02:00Z"
}
If payment fails, it publishes PaymentFailed, initiating compensation. For PaymentFailed, the compensation involves publishing an InventoryReservationReleased event to inventory.events to undo the previous reservation.
// inventory.events
{
"eventType": "InventoryReservationReleased",
"orderId": "ORD123",
"customerId": "CUST456",
"productId": "PROD789",
"quantityReleased": 2,
"timestamp": "2023-10-27T10:03:00Z"
}
Finally, the ShippingService consumes PaymentProcessed events. Upon successful shipping, it publishes a OrderShipped event to shipping.events.
// shipping.events
{
"eventType": "OrderShipped",
"orderId": "ORD123",
"customerId": "CUST456",
"trackingNumber": "TRKXYZ",
"timestamp": "2023-10-27T10:04:00Z"
}
If shipping fails (e.g., ShippingFailed), compensation cascades backward: OrderShipped fails -> PaymentRefunded event published to payments.events -> PaymentRefunded triggers InventoryReservationReleased to inventory.events.
The core of this coordination lies in each service listening to events from the previous stage and publishing its own outcome event. The "saga orchestrator" is implicitly distributed across these services, each acting as a state machine reacting to events. The Kafka topics act as the shared log of all saga progress and failures.
What most people miss is that the compensation logic isn’t just a "rollback" in the database sense. It’s about publishing compensating events. For instance, a PaymentFailed event doesn’t just cancel the payment; it triggers the release of inventory by publishing an InventoryReservationReleased event. The system’s state is updated by the effects of these compensating events being consumed.
This event-driven approach to sagas provides a highly decoupled and resilient system. If a service is temporarily down, Kafka ensures events are durably stored and will be processed once the service recovers.
The next logical step in building more complex sagas involves handling parallel branches or more intricate conditional logic, which can be managed by introducing dedicated saga coordination topics or more sophisticated event routing.