A multi-AZ Redpanda deployment doesn’t actually guarantee zero downtime during a zone failure; it ensures that your data remains available and can be recovered with minimal impact.
Let’s see what that looks like. Imagine a Redpanda cluster spread across three Availability Zones (AZs), say us-east-1a, us-east-1b, and us-east-1c. Each partition in Redpanda is replicated across these zones. A typical replication factor (RF) is 3, meaning every piece of data has three copies, one in each zone.
Here’s a simplified view of a topic’s partitions and their replicas:
Topic: my-topic
Partition 0:
Leader: Node in us-east-1a
Replicas: us-east-1a, us-east-1b, us-east-1c
Partition 1:
Leader: Node in us-east-1b
Replicas: us-east-1b, us-east-1c, us-east-1a
Partition 2:
Leader: Node in us-east-1c
Replicas: us-east-1c, us-east-1a, us-east-1b
When a producer sends data, it writes to the leader replica for that partition. The leader then forwards the write to its follower replicas. A write is considered committed once a majority of replicas (including the leader) acknowledge it. For RF=3, a majority is 2.
Now, let’s simulate a failure. Suppose us-east-1a goes offline.
What happens?
Redpanda’s Raft consensus protocol detects the loss of a quorum. Since a majority of replicas (2 out of 3) are still available in us-east-1b and us-east-1c, the cluster remains operational.
For Partition 0, the leader was in us-east-1a. When that zone fails, the leader is lost. Redpanda’s internal controller will notice that the leader is no longer reachable. It will then initiate a leader election among the remaining in-sync replicas in us-east-1b and us-east-1c. One of these will be elected as the new leader.
Topic: my-topic
Partition 0 (after us-east-1a failure):
Leader: Node in us-east-1b (elected)
Replicas: us-east-1b, us-east-1c (us-east-1a replica is stale)
Producers and consumers will experience a brief pause while the leader election occurs. This is typically on the order of seconds. Once a new leader is elected, operations resume. Data written before the failure is safe because it was acknowledged by a quorum. New writes will now go to the new leader in us-east-1b.
The key here is that Redpanda is designed for availability during failures, not necessarily zero interruption. The brief "pause" is the system re-establishing quorum and electing new leaders.
How to configure and manage this:
-
Cluster Configuration: Ensure your
redpanda.yamlhastune_memory_controlenabled and sufficient memory allocated for the Redpanda process. For multi-AZ, you’ll typically deploy Redpanda nodes on separate EC2 instances (or equivalent) within different AZs.# Example redpanda.yaml snippet redpanda: # ... other configs tune_memory_control: enabled: true memory_limit_mb: 4096 # Example: 4GB -
Replication Factor (RF): For multi-AZ deployments, an RF of 3 is standard. This means for every partition, you have three copies of the data spread across three different AZs. You can check this with
rpk topic info <topic-name>.rpk topic info my-topicLook for the
Replication factorandPartitionssections. -
Acknowledge Settings (Producers): Producers must be configured to wait for an acknowledgment from a majority of replicas. For Redpanda, this is
acks=all(oracks=1if you are willing to lose a single write at the cost of higher throughput, which is generally not recommended for critical data in multi-AZ setups).// Java Kafka Producer example Properties props = new Properties(); props.put("bootstrap.servers", "broker1:9092,broker2:9092,broker3:9092"); props.put("acks", "all"); // Crucial for durability props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); KafkaProducer<String, String> producer = new KafkaProducer<>(props); -
Consumer Groups: Consumers are part of consumer groups. When a broker fails, the consumer group coordinator will detect the loss and trigger a rebalance. Consumers will pause, and partitions will be reassigned to healthy brokers. This is a standard Kafka consumer behavior.
-
Monitoring: Set up alerts for broker health, partition leader elections, and ZooKeeper/Raft quorum status. Prometheus and Grafana are common tools for this. Key metrics include
raft_state_machine_current_term(changes indicate leader elections) andredpanda_broker_is_leader(per partition).# Example of checking partition leader status with rpk rpk partition list --cluster | grep my-topicThis command will show you the leader for each partition. If a zone fails, you’ll see a new broker become the leader for partitions that previously had their leader in the failed zone.
The most surprising thing about Redpanda’s multi-AZ resilience is that it leverages the same Raft consensus mechanism that Kafka uses for leader election, but with a more integrated and performant implementation. This means the core principles of quorum and leader election apply directly, and Redpanda’s optimizations make the process faster and more robust.
When a zone fails, Redpanda’s internal controller is responsible for detecting the loss of nodes and initiating leader elections for affected partitions. This controller is a critical component, and its ability to quickly assess quorum status and trigger elections is what underpins the cluster’s availability.
The next challenge you’ll face is managing network partitions within your cluster, which are more complex than a full zone outage.