Sagas with Temporal.io: Durable Workflow Engine
Temporal’s workflow engine isn’t just a scheduler; it’s a state machine that is your business logic, running indefinitely and reliably.
Let’s watch a simple order processing saga unfold. Imagine a customer places an order. This triggers a Temporal workflow.
// Workflow definition
type OrderWorkflow struct {
WorkflowID string
}
func (w *OrderWorkflow) Run(ctx workflow.Context, orderDetails OrderDetails) error {
// Step 1: Reserve inventory
var inventoryReservationResult InventoryReservationResult
err := workflow.ExecuteActivity(ctx, ReserveInventory, orderDetails).Get(ctx, &inventoryReservationResult)
if err != nil {
// If inventory reservation fails, compensate by cancelling the order
workflow.ExecuteActivity(ctx, CancelOrder, orderDetails)
return fmt.Errorf("inventory reservation failed: %w", err)
}
// Step 2: Process payment
var paymentResult PaymentResult
err = workflow.ExecuteActivity(ctx, ProcessPayment, orderDetails).Get(ctx, &paymentResult)
if err != nil {
// If payment fails, compensate by releasing inventory and cancelling the order
workflow.ExecuteActivity(ctx, ReleaseInventory, orderDetails)
workflow.ExecuteActivity(ctx, CancelOrder, orderDetails)
return fmt.Errorf("payment processing failed: %w", err)
}
// Step 3: Ship order
var shippingResult ShippingResult
err = workflow.ExecuteActivity(ctx, ShipOrder, orderDetails).Get(ctx, &shippingResult)
if err != nil {
// If shipping fails, compensate by refunding payment, releasing inventory, and cancelling the order
workflow.ExecuteActivity(ctx, RefundPayment, orderDetails)
workflow.ExecuteActivity(ctx, ReleaseInventory, orderDetails)
workflow.ExecuteActivity(ctx, CancelOrder, orderDetails)
return fmt.Errorf("shipping failed: %w", err)
}
// If all steps succeed
return nil
}
In this OrderWorkflow, ExecuteActivity calls are made to external services (represented by activities). If any activity fails, the workflow executes compensating activities (e.g., ReleaseInventory, CancelOrder) to undo previous steps. This pattern is the essence of a saga.
The problem Temporal solves is that traditional distributed transactions (like two-phase commit) are brittle and don’t scale. They require all participants to be available and lock resources for the entire duration, which is often unacceptable for long-running business processes. Sagas, on the other hand, break down a long transaction into a sequence of local transactions. Each local transaction updates its own data and then triggers the next local transaction in the saga. If a local transaction fails, the saga executes compensating transactions that undo the work of preceding local transactions.
Temporal’s durability comes from its event sourcing architecture. Every decision the workflow makes (starting an activity, receiving a signal, completing, failing) is recorded as an event in a history. When a workflow worker restarts or crashes, Temporal replays these events from the history to reconstruct the workflow’s state. This means your workflow logic never loses its place. You don’t need to manually manage state persistence or checkpointing. The workflow is the state, and Temporal ensures its perfect replay.
The workflow.ExecuteActivity call isn’t just an RPC. Temporal schedules the activity, tracks its execution, and handles retries based on configured policies. The Get method on the activity future blocks the workflow logic until the activity completes, fails, or times out. Crucially, this "blocking" is cooperative. Temporal records "workflow is waiting for activity X" in the event history. If the worker processing this workflow crashes, another worker can pick up the workflow, replay the history, see that it was waiting for activity X, and schedule it again.
The workflow.Context is key. It has a timeout and can be cancelled. If the workflow context is cancelled (e.g., due to an external signal or a parent workflow completing), any activities or timers running within that context will also be cancelled. This is how you signal to the workflow that it should stop or change course, allowing for dynamic adjustments to long-running processes.
The workflow.ExecuteActivity function, when called within a workflow, actually just records an ActivityScheduled event in the workflow’s history. The Temporal server then dispatches this activity to an activity worker. The activity worker executes the actual business logic. The result of the activity is sent back to the Temporal server, which records an ActivityCompleted or ActivityFailed event in the history. The workflow code then resumes from its Get call, reading the result from the history. This decoupling is what makes Temporal so resilient.
The duration of workflow.NewContext (e.g., workflow.NewContext(ctx, workflow.StandardOptions{StartToCloseTimeout: 24 * time.Hour})) is not a hard deadline for the entire workflow execution, but rather a maximum duration for a single execution path within that specific context. If an activity takes longer than its own StartToCloseTimeout, the activity itself will time out, and the workflow will execute its compensation logic. The overall workflow can continue running indefinitely as long as its activities complete within their own timeouts and the workflow context itself isn’t explicitly cancelled or timed out.
The true power is that Temporal guarantees that your workflow code will execute exactly as written, even across worker restarts, network failures, and machine crashes. It achieves this by replaying the workflow’s history. If a worker crashes mid-activity, Temporal will reschedule that activity on another worker. If the workflow logic itself has a bug that causes it to fail, Temporal will keep retrying based on your workflow’s retry policy, and if the bug is fixed, the workflow will resume from its last stable state.
The next step you’ll likely encounter is handling external events that influence your workflow, like a customer calling support to change an order, which would involve using workflow signals.