A saga is not a transaction; it’s a sequence of local transactions, where each transaction updates data within a single service, and the completion of each transaction triggers the next.
Let’s see a simple saga in action with Zeebe, the workflow engine behind Camunda 8. Imagine we’re booking a trip.
First, the TripBooking process starts. This is a BPMN diagram:
<bpmn:definitions xmlns:bpmn="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:bpmndi="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:dc="http://www.omg.org/spec/DD/20100524/DC" xmlns:di="http://www.omg.org/spec/DD/20100524/DI" id="Definitions_1" targetNamespace="http://bpmn.io/schema/bpmn">
<bpmn:process id="TripBooking" isExecutable="true">
<bpmn:startEvent id="StartEvent_1"/>
<bpmn:sequenceFlow id="SequenceFlow_1" sourceRef="StartEvent_1" targetRef="Task_BookHotel"/>
<bpmn:serviceTask id="Task_BookHotel" name="Book Hotel">
<bpmn:extensionElements>
<camunda:inputOutput>
<camunda:outputParameter name="hotelBookingId">${hotelBookingId}</camunda:outputParameter>
</camunda:inputOutput>
</bpmn:extensionElements>
</bpmn:serviceTask>
<bpmn:sequenceFlow id="SequenceFlow_2" sourceRef="Task_BookHotel" targetRef="Task_BookFlight"/>
<bpmn:serviceTask id="Task_BookFlight" name="Book Flight">
<bpmn:extensionElements>
<camunda:inputOutput>
<camunda:outputParameter name="flightBookingId">${flightBookingId}</camunda:outputParameter>
</camunda:inputOutput>
</bpmn:extensionElements>
</bpmn:serviceTask>
<bpmn:sequenceFlow id="SequenceFlow_3" sourceRef="Task_BookFlight" targetRef="Task_ConfirmTrip"/>
<bpmn:serviceTask id="Task_ConfirmTrip" name="Confirm Trip"/>
<bpmn:endEvent id="EndEvent_1"/>
<bpmn:sequenceFlow id="SequenceFlow_4" sourceRef="Task_ConfirmTrip" targetRef="EndEvent_1"/>
</bpmn:process>
<bpmndi:BPMNDiagram id="BPMNDiagram_1">
<bpmndi:BPMNPlane id="BPMNPlane_1" bpmnElement="TripBooking">
<bpmndi:BPMNShape id="BPMNShape_StartEvent_1" bpmnElement="StartEvent_1">
<dc:Bounds height="36" width="36" x="173" y="102"/>
</bpmndi:BPMNShape>
<bpmndi:BPMNEdge id="SequenceFlow_1_di" bpmnElement="SequenceFlow_1">
<di:Waypoint x="209" y="120"/>
<di:Waypoint x="257" y="120"/>
</bpmndi:BPMNEdge>
<bpmndi:BPMNShape id="BPMNShape_Task_BookHotel" bpmnElement="Task_BookHotel">
<dc:Bounds height="80" width="100" x="257" y="80"/>
</bpmndi:BPMNShape>
<bpmndi:BPMNEdge id="SequenceFlow_2_di" bpmnElement="SequenceFlow_2">
<di:Waypoint x="357" y="120"/>
<di:Waypoint x="405" y="120"/>
</bpmndi:BPMNEdge>
<bpmndi:BPMNShape id="BPMNShape_Task_BookFlight" bpmnElement="Task_BookFlight">
<dc:Bounds height="80" width="100" x="405" y="80"/>
</bpmndi:BPMNShape>
<bpmndi:BPMNEdge id="SequenceFlow_3_di" bpmnElement="SequenceFlow_3">
<di:Waypoint x="505" y="120"/>
<di:Waypoint x="553" y="120"/>
</bpmndi:BPMNEdge>
<bpmndi:BPMNShape id="BPMNShape_Task_ConfirmTrip" bpmnElement="Task_ConfirmTrip">
<dc:Bounds height="80" width="100" x="553" y="80"/>
</bpmndi:BPMNShape>
<bpmndi:BPMNEdge id="SequenceFlow_4_di" bpmnElement="SequenceFlow_4">
<di:Waypoint x="653" y="120"/>
<di:Waypoint x="701" y="120"/>
</bpmndi:BPMNEdge>
<bpmndi:BPMNShape id="BPMNShape_EndEvent_1" bpmnElement="EndEvent_1">
<dc:Bounds height="36" width="36" x="701" y="102"/>
</bpmndi:BPMNShape>
</bpmndi:BPMNPlane>
</bpmndi:BPMNDiagram>
</bpmn:definitions>
When this process is deployed to Zeebe, it becomes an executable workflow. A client application (or another service) can then trigger an instance of this workflow.
{
"type": "io.camunda.zeebe.client.api.command.CreateInstanceCommand",
"bpmnProcessId": "TripBooking",
"variables": {
"tripDetails": {
"destination": "Paris",
"startDate": "2023-10-26",
"endDate": "2023-10-30"
}
}
}
Zeebe, upon receiving this command, creates a new instance of the TripBooking process and places a "job" for the Book Hotel task onto its job queue. A worker service, subscribed to book-hotel jobs, picks this up.
The worker executes the actual hotel booking logic (e.g., calling an external API). Let’s say it successfully books a hotel and gets back a hotelBookingId of H12345. The worker then completes the job, sending the result back to Zeebe:
{
"type": "io.camunda.zeebe.client.api.command.CompleteJobCommand",
"jobKey": 1234567890, // The key of the job it received
"variables": {
"hotelBookingId": "H12345"
}
}
Zeebe receives this completion, updates the workflow instance’s variables, and immediately activates the next task: Book Flight. A different worker, subscribed to book-flight jobs, will pick this up. It performs its logic, gets a flightBookingId of F67890, and completes its job.
Finally, the Confirm Trip task is activated, which might be a simple notification or a more complex step. Once that’s done, the workflow instance reaches the end event, marking the saga as successfully completed.
The core problem sagas solve is how to maintain data consistency across distributed services without relying on a single, monolithic transaction manager. Each service owns its data and transaction. If the Book Hotel service fails, its local transaction is rolled back. If the Book Flight service fails after the hotel was booked, the saga needs a compensation mechanism.
This is where BPMN’s error handling and compensation patterns come in. If the Book Flight task fails (e.g., no flights available), the BPMN model would have an error boundary event attached to it. This event would trigger a compensation flow.
<bpmn:process id="TripBooking" isExecutable="true">
<!-- ... existing tasks ... -->
<bpmn:boundaryEvent id="CancelHotelOnError" attachedToRef="Task_BookFlight">
<bpmn:errorEventDefinition errorRef="flightBookingError"/>
</bpmn:boundaryEvent>
<bpmn:serviceTask id="Task_CancelHotel" name="Cancel Hotel">
<bpmn:incoming>SequenceFlow_5</bpmn:incoming>
<bpmn:outgoing>SequenceFlow_6</bpmn:outgoing>
</bpmn:serviceTask>
<bpmn:sequenceFlow id="SequenceFlow_5" sourceRef="CancelHotelOnError" targetRef="Task_CancelHotel"/>
<bpmn:endEvent id="EndEvent_Error">
<bpmn:terminateEventDefinition/>
</bpmn:endEvent>
<bpmn:sequenceFlow id="SequenceFlow_6" sourceRef="Task_CancelHotel" targetRef="EndEvent_Error"/>
</bpmn:process>
In this modified diagram, if Task_BookFlight fails with an error of type flightBookingError, the CancelHotelOnError boundary event catches it. This triggers Task_CancelHotel, which is the compensation for the Book Hotel task. The Task_CancelHotel worker would then call the hotel service’s cancellation API. After cancellation, the workflow instance terminates with an error.
The power here is that Zeebe orchestrates the entire process, including the compensation logic. Your workers only need to worry about their single, local transaction and its corresponding compensation. Zeebe handles the state transitions, retries, and ensures that if a step fails, the necessary cleanup steps are executed in the correct order.
The most surprising thing about sagas is that they often don’t involve explicit rollback commands in the same way a traditional ACID transaction does. Instead, they rely on compensation actions that undo the effects of a completed local transaction. This means you need to design your services with idempotency and explicit cancellation/reversal operations in mind from the start, which is a significant shift in thinking from monolithic transaction management. For example, a "cancel hotel booking" operation must be idempotent: calling it multiple times should have the same effect as calling it once, and it must succeed even if the original booking was already cancelled.
The next problem you’ll often encounter is handling long-running, multi-step sagas where external factors or user decisions might be involved, leading to the need for more complex state management and potentially human-in-the-loop interactions within the BPMN process.