A saga’s state machine isn’t just a flowchart of possible states; it’s a precise, executable model of how your distributed transactions will actually behave, including every failure mode.
Consider this simple order placement saga:
{
"id": "order-placement-saga",
"initial": "START",
"states": {
"START": {
"onEntry": [
{
"type": "command",
"command": "CreateOrderCommand",
"destination": "order-service"
}
],
"on": {
"ORDER_CREATED": "WAITING_FOR_PAYMENT"
}
},
"WAITING_FOR_PAYMENT": {
"onEntry": [
{
"type": "command",
"command": "ProcessPaymentCommand",
"destination": "payment-service"
}
],
"on": {
"PAYMENT_PROCESSED": "ORDER_PLACED",
"PAYMENT_FAILED": "PAYMENT_REJECTED"
}
},
"ORDER_PLACED": {
"type": "final"
},
"PAYMENT_REJECTED": {
"onEntry": [
{
"type": "command",
"command": "CancelOrderCommand",
"destination": "order-service"
}
],
"on": {
"ORDER_CANCELLED": "REJECTED"
}
},
"REJECTED": {
"type": "final"
}
}
}
This JSON describes the saga’s flow. initial defines the starting point. states maps each state name to its behavior. onEntry actions are executed when entering a state. on defines transitions based on incoming events. type: "final" signifies an end state.
The CreateOrderCommand is sent to order-service. If order-service responds with ORDER_CREATED, the saga moves to WAITING_FOR_PAYMENT. There, ProcessPaymentCommand is dispatched to payment-service. A PAYMENT_PROCESSED event leads to ORDER_PLACED (a successful end). A PAYMENT_FAILED event triggers a compensation: CancelOrderCommand to order-service, leading to a REJECTED final state.
This explicit state machine is critical because it forces you to model every possible outcome, not just the happy path. What happens if CreateOrderCommand fails? Or if ProcessPaymentCommand times out? Your state machine needs states for these too.
Consider the failure of ProcessPaymentCommand. You need a state for that:
{
"id": "order-placement-saga",
"initial": "START",
"states": {
"START": {
"onEntry": [
{
"type": "command",
"command": "CreateOrderCommand",
"destination": "order-service"
}
],
"on": {
"ORDER_CREATED": "WAITING_FOR_PAYMENT",
"ORDER_CREATION_FAILED": "ORDER_CREATION_FAILED_STATE"
}
},
"WAITING_FOR_PAYMENT": {
"onEntry": [
{
"type": "command",
"command": "ProcessPaymentCommand",
"destination": "payment-service"
}
],
"on": {
"PAYMENT_PROCESSED": "ORDER_PLACED",
"PAYMENT_FAILED": "PAYMENT_REJECTED",
"PAYMENT_PROCESSING_FAILED": "PAYMENT_PROCESSING_FAILED_STATE"
}
},
"ORDER_PLACED": {
"type": "final"
},
"PAYMENT_REJECTED": {
"onEntry": [
{
"type": "command",
"command": "CancelOrderCommand",
"destination": "order-service"
}
],
"on": {
"ORDER_CANCELLED": "REJECTED"
}
},
"REJECTED": {
"type": "final"
},
"ORDER_CREATION_FAILED_STATE": {
"type": "final"
},
"PAYMENT_PROCESSING_FAILED_STATE": {
"onEntry": [
{
"type": "command",
"command": "RefundPaymentCommand",
"destination": "payment-service"
}
],
"on": {
"PAYMENT_REFUNDED": "PAYMENT_FAILED_TO_PROCESS_RECOVERY_COMPLETE",
"REFUND_FAILED": "PAYMENT_FAILED_TO_PROCESS_RECOVERY_FAILED"
}
},
"PAYMENT_FAILED_TO_PROCESS_RECOVERY_COMPLETE": {
"type": "final"
},
"PAYMENT_FAILED_TO_PROCESS_RECOVERY_FAILED": {
"type": "final"
}
}
}
Here, PAYMENT_PROCESSING_FAILED is a new event. Upon receiving it, we transition to PAYMENT_PROCESSING_FAILED_STATE. This state’s onEntry action is to execute a compensating command: RefundPaymentCommand. This is crucial for distributed transactions. If payment processing fails after the order was conceptually created but before it was fully confirmed, we need to reverse the payment.
The state machine becomes the single source of truth for your saga’s logic. When a command is sent, the saga engine waits for a specific event. If it receives an unexpected event, or no event within a timeout, it transitions to an error state. This explicit definition of all transitions, including error paths and compensation logic, is what makes sagas robust. It’s not about listing states; it’s about meticulously defining the transitions between them, driven by events and executed by commands, covering every success, failure, and recovery scenario.
The real power emerges when you consider that these state machine definitions can be loaded and executed by a generic saga engine. Your application code then only needs to emit the correct events and handle incoming commands, rather than orchestrating complex multi-step logic with manual retries and error handling. The state machine itself is the orchestration.
The most overlooked aspect of saga state machines is their role in testability. Because the entire flow, including all failure paths and compensations, is declaratively defined, you can load these definitions into your test suite and simulate any sequence of events and command responses. This allows you to exhaustively test your saga’s resilience before it ever hits production, catching edge cases that manual coding would inevitably miss.
This explicit, executable definition of distributed transaction logic is the foundation of reliable microservice choreography.