The magic of sagas is their ability to orchestrate complex, multi-step business processes across distributed systems, but the real secret sauce is how they remember where they left off.
Let’s see a simple order placement saga in action. Imagine a customer wants to buy a widget.
1. Order Service:
- A new order arrives. The
CreateOrdercommand is sent. - The saga instance for this order is created and its state is persisted.
- The saga then initiates the
ReserveInventorycommand to the Inventory service.
{
"sagaId": "order-123",
"currentState": "OrderCreated",
"orderId": "order-123",
"customerId": "cust-abc",
"widgetId": "widget-xyz",
"inventoryReserved": false,
"paymentProcessed": false
}
2. Inventory Service:
- Receives
ReserveInventory. - If inventory is available, it reserves it and sends an
InventoryReservedevent back.
3. Saga Orchestrator:
- Receives
InventoryReservedevent. - Updates its state to
InventoryReserved. - Persists the new state.
- Initiates
ProcessPaymentcommand to the Payment service.
{
"sagaId": "order-123",
"currentState": "InventoryReserved",
"orderId": "order-123",
"customerId": "cust-abc",
"widgetId": "widget-xyz",
"inventoryReserved": true,
"paymentProcessed": false
}
4. Payment Service:
- Receives
ProcessPayment. - Processes the payment and sends a
PaymentProcessedevent.
5. Saga Orchestrator:
- Receives
PaymentProcessedevent. - Updates its state to
PaymentCompleted. - Persists the final state.
- The saga is now complete.
{
"sagaId": "order-123",
"currentState": "PaymentCompleted",
"orderId": "order-123",
"customerId": "cust-abc",
"widgetId": "widget-xyz",
"inventoryReserved": true,
"paymentProcessed": true
}
If any step fails (e.g., InventoryReserved fails because of insufficient stock), the saga would transition to a Compensation state, initiating compensating actions like CancelOrder and ReleaseInventory to roll back previous steps.
The problem this solves is maintaining consistency in distributed transactions. Traditional ACID transactions are impossible across microservices. Sagas provide an eventual consistency model by breaking down a large transaction into a sequence of local transactions. Each local transaction updates its own database and then publishes an event or sends a command to trigger the next step. If any step fails, compensating transactions are executed in reverse order to undo the changes.
The core components you control are:
- Saga State Machine Definition: The sequence of steps, events, and commands. This is often defined declaratively.
- State Persistence: Where the saga’s current state and its associated data are stored. This is crucial for recovery.
- Event/Command Bus: The communication mechanism between services and the saga orchestrator.
The different "stores" for saga state persistence are essentially different database technologies, each with trade-offs:
-
Relational Databases (e.g., PostgreSQL, MySQL):
- Pros: Familiarity, ACID guarantees for the saga state itself, strong consistency.
- Cons: Can become a bottleneck if sagas are extremely high-volume; schema changes can be complex.
- When to use: When your existing infrastructure is relational, you need strict consistency for the saga state, and your transaction volume isn’t astronomical.
-
NoSQL Document Databases (e.g., MongoDB, Cosmos DB):
- Pros: Flexible schema, good for storing complex, evolving saga state as JSON documents, often scales well horizontally.
- Cons: Eventual consistency for reads can be a concern if not managed carefully; transactional capabilities vary.
- When to use: When your saga state is naturally document-like and you need high availability and horizontal scalability.
-
NoSQL Key-Value Stores (e.g., Redis, DynamoDB):
- Pros: Extremely high performance, low latency, excellent for simple state lookups and frequent updates. DynamoDB offers strong consistency options.
- Cons: Limited querying capabilities; complex state might require serializing/deserializing.
- When to use: For very high-throughput sagas where state is simple and quick access is paramount. Redis can also be used for in-memory state with persistence.
-
Event Stores (e.g., EventStoreDB, Kafka):
- Pros: Sagas are fundamentally event-driven; an event store naturally captures the entire history of state transitions as immutable events, providing a full audit log. Replaying events allows rebuilding state.
- Cons: Can have a steeper learning curve; querying for specific saga states might require projections.
- When to use: When you want a true event-driven architecture, built-in auditability, and the ability to reconstruct any past state.
The most surprising thing about saga persistence is that you don’t always need a separate, dedicated store. Many modern saga implementations, especially those built on event sourcing principles, treat the event log itself as the source of truth. The saga’s state is simply a projection derived from replaying its relevant events. This drastically simplifies the infrastructure by eliminating the need to synchronize state between an event log and a separate database.
Choosing the right store involves balancing consistency requirements, performance needs, scalability, and operational complexity against your existing infrastructure and team expertise.
The next challenge you’ll face is handling the inherent complexity of managing compensating transactions and ensuring they don’t fail.