Pulsar’s "exactly-once" producer semantics don’t actually guarantee exactly-once delivery; they guarantee exactly-once processing by the producer itself.

Let’s look at how this works in practice. Imagine you have a producer producer-A sending messages to a Pulsar topic my-topic.

{
  "payload": {"user_id": 123, "action": "login"},
  "sequence_id": 1,
  "producer_id": "producer-A"
}

If producer-A sends this message and then crashes before receiving an acknowledgment from the broker, a standard producer would likely retry, sending the same message again. This results in duplicate messages being stored in Pulsar.

Pulsar’s "exactly-once" producer mode addresses this by ensuring that even if the producer crashes and retries, the broker only accepts and stores a given message once. It achieves this through a combination of sequence IDs and a mechanism on the broker side to track these IDs.

Here’s the internal flow:

  1. Producer Initialization: When a producer starts in "exactly-once" mode, it establishes a persistent connection to the broker. It also generates a unique producer_id for itself.
  2. Message Sequencing: For every message sent, the producer increments a sequence_id. This sequence ID is unique for a given producer_id and topic.
  3. Broker Reception and Deduplication: When a broker receives a message, it checks the producer_id and sequence_id. It maintains a cache (or a more persistent store for longer-term deduplication) of recently seen (producer_id, sequence_id) pairs for each topic partition.
  4. Duplicate Detection: If the (producer_id, sequence_id) pair has been seen before, the broker discards the incoming message.
  5. Acknowledgement: If the (producer_id, sequence_id) pair is new, the broker stores it and then acknowledges the message back to the producer. This acknowledgment contains the sequence_id of the successfully processed message.

This mechanism ensures that if the producer crashes after sending a message but before receiving the acknowledgment, and then restarts and resends the same message (with the same sequence_id), the broker will recognize the sequence_id as already processed and discard the duplicate. The producer will eventually receive the acknowledgment for the original successful send (or a timeout, which it will then interpret as success due to idempotency).

The key configuration for this is on the producer side when creating the producer instance. You’d typically set the enableBatching to true (which is the default for exactly-once) and importantly, set the enableDeduplication to true.

Producer<String> producer = pulsarClient.newProducer(Schema.STRING)
    .topic("persistent://my-tenant/my-namespace/my-topic")
    .enableBatching(true) // Default for exactly-once, but good to be explicit
    .enableDeduplication(true) // This is the core setting
    .create();

The broker side needs to have deduplication enabled as well, which is usually a cluster-wide setting in broker.conf or standalone.conf:

brokerDeduplicationEnabled=true

The brokerDeduplicationEnabled setting controls whether the broker will perform the deduplication check. Without it, even if the producer is configured for deduplication, the broker won’t actually prevent duplicates.

The "exactly-once processing" comes from the producer’s perspective. It means that for each distinct logical message the application intends to send, the producer will ensure that Pulsar receives it at most once. The producer itself won’t resend a message if it has already received an acknowledgment for it.

The nuance here is that Pulsar’s "exactly-once" producer doesn’t magically make the entire system exactly-once. A consumer reading from this topic could still process the same message multiple times if it crashes and restarts without proper acknowledgment handling of its own. The guarantee is at the producer-to-broker ingestion point.

The most surprising thing about Pulsar’s exactly-once producer is that it relies on the producer not resending a message if it has already received an acknowledgment, even if the acknowledgment was for a message that didn’t make it to the broker. The producer’s internal state and the acknowledgment from the broker are what drive the deduplication logic.

The next concept you’ll likely encounter is how to achieve end-to-end exactly-once processing, which involves careful design on the consumer side as well, often using transactional or idempotent consumers.

Want structured learning?

Take the full Pulsar course →