The most surprising thing about Saga idempotency is that you don’t actually need to implement it yourself; the underlying message broker often handles it for you.

Let’s see this in action. Imagine a simple order processing saga. When an order is placed, a message is sent to an "OrderCreated" topic. A PaymentService listens for this message, processes the payment, and then publishes a "PaymentProcessed" message. If the PaymentService crashes and restarts, it might re-process the same "OrderCreated" message. Without idempotency, this would lead to double charging the customer.

Here’s a simplified representation of the flow:

  1. Client: POST /orders with {"item": "widget", "quantity": 1, "customerId": "cust123"}
  2. OrderService: Receives request, generates orderId: "ORD789", publishes {"orderId": "ORD789", "customerId": "cust123", "amount": 10.00} to order-events topic.
  3. PaymentService: Consumes {"orderId": "ORD789", "customerId": "cust123", "amount": 10.00}.
    • Idempotent Check: Has ORD789 been processed? (We’ll get to this).
    • If not, processes payment, publishes {"orderId": "ORD789", "status": "PAID"} to payment-events topic.
    • If yes, it does nothing.

The core problem is that network glitches or service restarts can cause the same message to be delivered multiple times to a consumer. If the consumer performs an action that has side effects (like charging a credit card or decrementing inventory), it could lead to data inconsistency or incorrect business outcomes.

The mental model for handling this is to ensure that processing a message multiple times has the same effect as processing it once. This is achieved by making the operation itself idempotent, not necessarily the message delivery.

How does this work internally? Most modern message brokers, like Kafka, RabbitMQ, or Azure Service Bus, offer delivery guarantees. For example, Kafka, in its default configuration, provides "at-least-once" delivery. This means a message might be delivered more than once. The broker doesn’t guarantee "exactly-once" delivery to the application logic without consumer-side cooperation.

The "cooperation" is the idempotent consumer. The typical pattern involves the consumer tracking the processing status of messages it has successfully handled. This usually involves a persistent store, often a database or a dedicated state store.

Here’s the common implementation pattern:

  1. Message Arrival: The PaymentService receives the OrderCreated message.
  2. Unique Identifier: The message contains a unique identifier for the business operation it represents. In our example, this is orderId: "ORD789".
  3. Check for Duplicates: The PaymentService queries its state store (e.g., a processed_orders table in a database) to see if an entry for ORD789 already exists.
    • SELECT COUNT(*) FROM processed_orders WHERE order_id = 'ORD789';
  4. Process or Skip:
    • If COUNT is 0: The order hasn’t been processed. The PaymentService proceeds with payment processing, marks the order as processed in its state store (e.g., INSERT INTO processed_orders (order_id, status) VALUES ('ORD789', 'PAID');), and then publishes the PaymentProcessed event.
    • If COUNT is 1 (or more): The order has already been processed. The PaymentService does nothing further for this message. It might acknowledge the message to the broker immediately, or it might wait for the broker to re-deliver it, only to find it’s already been processed.

This state store is crucial. It acts as the "source of truth" for which business operations have been completed. The key is that the write to the state store (marking the order as processed) and the read from the state store (checking if it’s processed) must be atomic or occur in a way that prevents race conditions. For instance, using database unique constraints on order_id in the processed_orders table can help. If an INSERT fails due to a unique constraint violation, it means another instance already inserted it.

The one thing most people don’t realize is that the idempotency isn’t about the message broker preventing duplicates from being sent. It’s about the consumer being able to detect and ignore duplicate deliveries of the same logical operation. The broker’s role is often limited to "at-least-once" delivery, pushing the responsibility of safe processing to the application layer.

The next concept you’ll grapple with is handling failures during the processing phase itself, where a message is received, deemed non-duplicate, but then the service crashes before it can update the state store and publish its own success event.

Want structured learning?

Take the full Saga-pattern course →