Pulsar’s geo-replication is designed to keep identical copies of topics synchronized across different data centers, ensuring high availability and disaster recovery.

Let’s see it in action. Imagine two Pulsar clusters, cluster-a and cluster-b, located in different geographical regions. We want to replicate a topic named persistent://public/default/my-geo-topic from cluster-a to cluster-b.

First, we need to configure Pulsar’s replication. This involves setting up the necessary metadata and ensuring clusters know about each other. On cluster-a, we’d typically add cluster-b to its configuration, and vice-versa.

# Example: Add cluster-b to cluster-a's configuration (simplified)
# This is usually done via Pulsar Admin API or by modifying ZooKeeper/etcd
# where cluster metadata is stored.
# The actual command might look like:
# pulsar-admin clusters create cluster-b \
#   --broker-service-url pulsar+ssl://broker-b.example.com:1234 \
#   --admin-service-url http://admin-b.example.com:8080

# On cluster-b, you'd do the same for cluster-a.

Now, we need to tell Pulsar which topics to replicate. This is done by creating a "peer" relationship between the clusters for specific namespaces.

# On cluster-a, configure the namespace 'public/default' to replicate to cluster-b
pulsar-admin namespaces set-replication-clusters public/default cluster-b

At this point, if cluster-a has a topic persistent://public/default/my-geo-topic, Pulsar’s internal mechanisms will start replicating messages published to this topic to cluster-b.

How does this magic happen? Pulsar uses a concept called "transactional replication" or "cursor replication." When a message is published to a topic in the primary cluster (let’s say cluster-a), it’s written to the topic’s ledger. A dedicated replication process, often running on the brokers, then picks up these messages. It doesn’t just stream them; it also manages the acknowledgment state of messages across clusters.

This means if a producer sends messages M1, M2, M3 to cluster-a, and a consumer on cluster-a acknowledges M1 and M2, the replication process ensures that cluster-b also receives M1, M2, M3 and reflects the acknowledgment state. If cluster-b is configured as the primary for a read-only topic, consumers on cluster-b can read these messages, and their acknowledgments will be propagated back to cluster-a (or other replicated clusters).

The internal system responsible for this is the GeoReplicationManager in Pulsar. It monitors topic activities and manages replication cursors (pointers to the last acknowledged message) for each replicated topic-partition across all peer clusters. When a message is acknowledged in one cluster, the GeoReplicationManager updates the replication cursor for that topic-partition in the other clusters. This ensures that consumers in any cluster see the same message stream and can pick up where others left off.

The configuration for replication isn’t just about which clusters are peers. It also involves defining the "source" cluster for a namespace. For example, if cluster-a is the primary source for public/default, messages produced there will be replicated out. If you later want cluster-b to become the primary, you’d reverse this configuration.

The real power comes from how Pulsar handles failover. If cluster-a goes down, and cluster-b has a copy of the topic, consumers can seamlessly switch to cluster-b to continue reading. Pulsar doesn’t just copy data; it replicates the state of consumption, which is crucial for applications that need to maintain exactly-once processing guarantees or simply pick up exactly where they left off.

A common misconception is that geo-replication is just a simple mirror. It’s more sophisticated because it actively synchronizes consumer cursors. This means if you have a producer writing to cluster-a and consumers reading from both cluster-a and cluster-b, the consumers on cluster-b will see the same messages as those on cluster-a, and their acknowledgments will be reflected across the clusters. This is managed by the ReplicationCursor object, which is stored and managed by the Pulsar brokers for each replicated partition.

The core of geo-replication is the ReplicationCursor which is a persistent state stored in ZooKeeper (or etcd) for each topic partition. This cursor tracks the last acknowledged message index for each peer cluster. When a message is acknowledged in cluster A for topic T, the ReplicationCursor for topic T is updated in cluster B (and any other peer clusters) to reflect this acknowledgment. This ensures that when messages are replicated to cluster B, they are only replicated up to the point where they’ve been acknowledged in cluster A.

The next challenge you’ll face is managing ownership and failover strategies for your geo-replicated topics.

Want structured learning?

Take the full Pulsar course →