A booking system saga is often misunderstood as just a sequence of independent API calls; in reality, it’s a distributed transaction where each step is a potential failure point that must be reversible.

Let’s walk through building one for flights, hotels, and payments. Imagine a user wants to book a flight and a hotel, then pay for both.

Here’s a simplified representation of the services involved:

  • Flight Service: Handles flight availability, booking, and cancellation.
  • Hotel Service: Manages hotel room availability, reservations, and cancellations.
  • Payment Service: Processes payments and handles refunds.
  • Orchestrator (Booking Service): Coordinates the entire process, initiating requests to other services and managing their responses.

The Saga Pattern

When a user initiates a booking, the Orchestrator calls the Flight Service to reserve a seat. If successful, it then calls the Hotel Service to book a room. Finally, it calls the Payment Service to charge the user.

This looks straightforward, but what happens if the Hotel Service fails after the Flight Service has already reserved a seat? The flight reservation needs to be canceled. This is where the saga pattern comes in. Each step in the process has a corresponding compensating action that can undo the previous step if a later step fails.

  • Flight Service:
    • Action: reserveSeat(flightId, userId)
    • Compensation: cancelFlightReservation(reservationId)
  • Hotel Service:
    • Action: bookRoom(hotelId, roomId, userId)
    • Compensation: cancelHotelReservation(reservationId)
  • Payment Service:
    • Action: chargeUser(amount, userId)
    • Compensation: refundUser(transactionId)

Orchestration Flow

Here’s how the Orchestrator would manage this:

  1. Start Booking: User requests flight FL123 and hotel HOT456.
  2. Reserve Flight: Orchestrator calls FlightService.reserveSeat("FL123", "user123").
    • Success: Flight reservation FR789 is created.
  3. Book Hotel: Orchestrator calls HotelService.bookRoom("HOT456", "RM99", "user123").
    • Success: Hotel reservation HR101 is created.
  4. Charge Payment: Orchestrator calls PaymentService.chargeUser(150.00, "user123").
    • Success: Payment transaction PT112 is created.
    • Finalize: All steps succeeded. Return success to user.

Handling Failures (The Saga)

Let’s say the HotelService.bookRoom call fails after FlightService.reserveSeat succeeded.

  1. Reserve Flight: Orchestrator calls FlightService.reserveSeat("FL123", "user123").
    • Success: Flight reservation FR789 is created.
  2. Book Hotel: Orchestrator calls HotelService.bookRoom("HOT456", "RM99", "user123").
    • Failure: Hotel is unavailable.
  3. Compensate Flight: Orchestrator calls FlightService.cancelFlightReservation("FR789").
    • Success: Flight reservation FR789 is canceled.
    • Finalize: All steps are compensated. Return failure to user.

Example Configuration (Conceptual - using a simplified API)

Let’s imagine the Orchestrator is written in Go.

// booking_service.go

type BookingOrchestrator struct {
	flightService *FlightServiceClient
	hotelService  *HotelServiceClient
	paymentService *PaymentServiceClient
}

func (o *BookingOrchestrator) BookTrip(userID, flightID, hotelID string, amount float64) error {
	// Step 1: Reserve Flight
	flightReservation, err := o.flightService.ReserveSeat(flightID, userID)
	if err != nil {
		return fmt.Errorf("failed to reserve flight: %w", err)
	}

	// Step 2: Book Hotel
	hotelReservation, err := o.hotelService.BookRoom(hotelID, userID)
	if err != nil {
		// Compensation: Cancel Flight Reservation
		if compErr := o.flightService.CancelFlightReservation(flightReservation.ID); compErr != nil {
			// Log this critical error: compensation failed!
			log.Printf("CRITICAL: Failed to compensate flight reservation %s: %v", flightReservation.ID, compErr)
		}
		return fmt.Errorf("failed to book hotel: %w", err)
	}

	// Step 3: Charge Payment
	paymentTransaction, err := o.paymentService.ChargeUser(amount, userID)
	if err != nil {
		// Compensation: Cancel Hotel Reservation
		if compErr := o.hotelService.CancelHotelReservation(hotelReservation.ID); compErr != nil {
			log.Printf("CRITICAL: Failed to compensate hotel reservation %s: %v", hotelReservation.ID, compErr)
		}
		// Compensation: Cancel Flight Reservation (already booked)
		if compErr := o.flightService.CancelFlightReservation(flightReservation.ID); compErr != nil {
			log.Printf("CRITICAL: Failed to compensate flight reservation %s: %v", flightReservation.ID, compErr)
		}
		return fmt.Errorf("failed to charge payment: %w", err)
	}

	// Success: All steps completed
	log.Printf("Trip booked successfully: Flight: %s, Hotel: %s, Payment: %s", flightReservation.ID, hotelReservation.ID, paymentTransaction.ID)
	return nil
}

This direct orchestration approach, where the orchestrator explicitly calls each service and then its compensating action on failure, is one way to implement sagas. Another common approach is "choreography," where services communicate via events, and each service reacts to events from others to perform its action or compensation.

The core principle, however, remains the same: for every forward action, there must be a corresponding backward action that undoes its effects. This ensures atomicity across distributed services, even in the face of partial failures.

The real complexity arises when you consider retries, idempotency, and ensuring that compensating actions themselves don’t fail. For instance, if PaymentService.refundUser fails, the system needs a strategy to retry that refund, potentially with exponential backoff, and alert operators if it remains unrecoverable.

Understanding how to manage state and reliably execute compensating actions is key to building robust distributed booking systems.

Want structured learning?

Take the full Saga-pattern course →