The Redpanda Idempotent Producer is a feature that guarantees messages are written to the Redpanda log exactly once, even if the producer sends the same message multiple times.

Let’s see it in action. Imagine a simple producer script that sends a message. Without idempotency enabled, if network issues cause the producer to retry sending the same message, Redpanda might write it twice.

from kafka import KafkaProducer
import json

producer = KafkaProducer(
    bootstrap_servers='localhost:9092',
    value_serializer=lambda x: json.dumps(x).encode('utf-8')
)

topic = 'my_topic'
message = {'key': 'value', 'count': 1}

# Sending the message multiple times due to simulated network issues
for _ in range(3):
    future = producer.send(topic, value=message)
    try:
        record_metadata = future.get(timeout=10)
        print(f"Sent message: {message} to topic {record_metadata.topic} partition {record_metadata.partition} offset {record_metadata.offset}")
    except Exception as e:
        print(f"Error sending message: {e}")

producer.flush()
producer.close()

If you run this script and then inspect my_topic, you might see the message appearing multiple times if enable_idempotence is not set.

Now, let’s enable idempotency. The key is to set enable_idempotence=True in the KafkaProducer configuration. This is crucial.

from kafka import KafkaProducer
import json

producer = KafkaProducer(
    bootstrap_servers='localhost:9092',
    value_serializer=lambda x: json.dumps(x).encode('utf-8'),
    enable_idempotence=True,  # This is the magic!
    acks='all',              # Idempotence requires acks='all'
    retries=5                # A reasonable number of retries
)

topic = 'my_topic'
message = {'key': 'value', 'count': 1}

for _ in range(3):
    future = producer.send(topic, value=message)
    try:
        record_metadata = future.get(timeout=10)
        print(f"Sent message: {message} to topic {record_metadata.topic} partition {record_metadata.partition} offset {record_metadata.offset}")
    except Exception as e:
        print(f"Error sending message: {e}")

producer.flush()
producer.close()

When enable_idempotence=True, the producer is assigned a unique Producer ID (PID) and for each message, it includes a sequence number. Redpanda tracks the highest sequence number it has successfully processed for a given PID and partition. If it receives a message with a sequence number that is less than or equal to the highest seen, it discards the duplicate but acknowledges the write successfully. This prevents duplicate writes without the producer needing to manage unique IDs for each message.

The problem idempotency solves is data duplication in scenarios where producers might retry sending messages due to transient network failures or broker unavailability. Without it, a producer could successfully send a message, not receive an acknowledgment due to a network glitch, and then resend the same message, leading to two identical entries in Redpanda. This can cause significant issues for consumers who expect to process each unique event only once, leading to incorrect aggregations, double charges, or other data integrity problems.

Internally, when enable_idempotence=True, the Kafka producer client automatically sets acks='all' and retries to a high value (defaulting to retries=2147483647 in kafka-python, though you should set a sensible value like retries=5 or retries=10). Redpanda, when it receives a message with an acks='all' setting and an appropriate ProducerId and SequenceNumber, will check its internal state. For each partition, it maintains a mapping of ProducerId to the highest SequenceNumber it has successfully committed. If the incoming message’s SequenceNumber is less than or equal to the stored SequenceNumber for that ProducerId and partition, Redpanda acknowledges the write as successful but does not write the message again. If the SequenceNumber is higher, it writes the message and updates the stored SequenceNumber. This mechanism ensures that even if the producer retries, only the first successful write is actually persisted.

A common misconception is that idempotency is handled solely by Redpanda. While Redpanda provides the necessary infrastructure to track producer state and sequence numbers, the client library (like kafka-python or java-kafka-clients) is responsible for generating the unique Producer ID and the monotonically increasing sequence numbers for each message sent to a specific partition. The producer client then includes these in the request. If the producer client is not configured with enable_idempotence=True, these fields are omitted, and Redpanda treats all messages as potentially new, without the deduplication logic.

The next step after ensuring exactly-once delivery is to consider transactional producers.

Want structured learning?

Take the full Redpanda course →