A booking system saga is often misunderstood as just a sequence of independent API calls; in reality, it’s a distributed transaction where each step is a potential failure point that must be reversible.
Let’s walk through building one for flights, hotels, and payments. Imagine a user wants to book a flight and a hotel, then pay for both.
Here’s a simplified representation of the services involved:
- Flight Service: Handles flight availability, booking, and cancellation.
- Hotel Service: Manages hotel room availability, reservations, and cancellations.
- Payment Service: Processes payments and handles refunds.
- Orchestrator (Booking Service): Coordinates the entire process, initiating requests to other services and managing their responses.
The Saga Pattern
When a user initiates a booking, the Orchestrator calls the Flight Service to reserve a seat. If successful, it then calls the Hotel Service to book a room. Finally, it calls the Payment Service to charge the user.
This looks straightforward, but what happens if the Hotel Service fails after the Flight Service has already reserved a seat? The flight reservation needs to be canceled. This is where the saga pattern comes in. Each step in the process has a corresponding compensating action that can undo the previous step if a later step fails.
- Flight Service:
- Action:
reserveSeat(flightId, userId) - Compensation:
cancelFlightReservation(reservationId)
- Action:
- Hotel Service:
- Action:
bookRoom(hotelId, roomId, userId) - Compensation:
cancelHotelReservation(reservationId)
- Action:
- Payment Service:
- Action:
chargeUser(amount, userId) - Compensation:
refundUser(transactionId)
- Action:
Orchestration Flow
Here’s how the Orchestrator would manage this:
- Start Booking: User requests flight
FL123and hotelHOT456. - Reserve Flight: Orchestrator calls
FlightService.reserveSeat("FL123", "user123").- Success: Flight reservation
FR789is created.
- Success: Flight reservation
- Book Hotel: Orchestrator calls
HotelService.bookRoom("HOT456", "RM99", "user123").- Success: Hotel reservation
HR101is created.
- Success: Hotel reservation
- Charge Payment: Orchestrator calls
PaymentService.chargeUser(150.00, "user123").- Success: Payment transaction
PT112is created. - Finalize: All steps succeeded. Return success to user.
- Success: Payment transaction
Handling Failures (The Saga)
Let’s say the HotelService.bookRoom call fails after FlightService.reserveSeat succeeded.
- Reserve Flight: Orchestrator calls
FlightService.reserveSeat("FL123", "user123").- Success: Flight reservation
FR789is created.
- Success: Flight reservation
- Book Hotel: Orchestrator calls
HotelService.bookRoom("HOT456", "RM99", "user123").- Failure: Hotel is unavailable.
- Compensate Flight: Orchestrator calls
FlightService.cancelFlightReservation("FR789").- Success: Flight reservation
FR789is canceled. - Finalize: All steps are compensated. Return failure to user.
- Success: Flight reservation
Example Configuration (Conceptual - using a simplified API)
Let’s imagine the Orchestrator is written in Go.
// booking_service.go
type BookingOrchestrator struct {
flightService *FlightServiceClient
hotelService *HotelServiceClient
paymentService *PaymentServiceClient
}
func (o *BookingOrchestrator) BookTrip(userID, flightID, hotelID string, amount float64) error {
// Step 1: Reserve Flight
flightReservation, err := o.flightService.ReserveSeat(flightID, userID)
if err != nil {
return fmt.Errorf("failed to reserve flight: %w", err)
}
// Step 2: Book Hotel
hotelReservation, err := o.hotelService.BookRoom(hotelID, userID)
if err != nil {
// Compensation: Cancel Flight Reservation
if compErr := o.flightService.CancelFlightReservation(flightReservation.ID); compErr != nil {
// Log this critical error: compensation failed!
log.Printf("CRITICAL: Failed to compensate flight reservation %s: %v", flightReservation.ID, compErr)
}
return fmt.Errorf("failed to book hotel: %w", err)
}
// Step 3: Charge Payment
paymentTransaction, err := o.paymentService.ChargeUser(amount, userID)
if err != nil {
// Compensation: Cancel Hotel Reservation
if compErr := o.hotelService.CancelHotelReservation(hotelReservation.ID); compErr != nil {
log.Printf("CRITICAL: Failed to compensate hotel reservation %s: %v", hotelReservation.ID, compErr)
}
// Compensation: Cancel Flight Reservation (already booked)
if compErr := o.flightService.CancelFlightReservation(flightReservation.ID); compErr != nil {
log.Printf("CRITICAL: Failed to compensate flight reservation %s: %v", flightReservation.ID, compErr)
}
return fmt.Errorf("failed to charge payment: %w", err)
}
// Success: All steps completed
log.Printf("Trip booked successfully: Flight: %s, Hotel: %s, Payment: %s", flightReservation.ID, hotelReservation.ID, paymentTransaction.ID)
return nil
}
This direct orchestration approach, where the orchestrator explicitly calls each service and then its compensating action on failure, is one way to implement sagas. Another common approach is "choreography," where services communicate via events, and each service reacts to events from others to perform its action or compensation.
The core principle, however, remains the same: for every forward action, there must be a corresponding backward action that undoes its effects. This ensures atomicity across distributed services, even in the face of partial failures.
The real complexity arises when you consider retries, idempotency, and ensuring that compensating actions themselves don’t fail. For instance, if PaymentService.refundUser fails, the system needs a strategy to retry that refund, potentially with exponential backoff, and alert operators if it remains unrecoverable.
Understanding how to manage state and reliably execute compensating actions is key to building robust distributed booking systems.