Pulsar clusters are not just arbitrary groupings of brokers; they are foundational units of isolation and administration that dictate how data flows and how policies are enforced.
Let’s see this in action. Imagine you have two distinct Pulsar clusters, us-west and eu-central, each with its own set of brokers and bookies.
# List available clusters
pulsar-admin clusters list
# Output might look like:
# us-west
# eu-central
# Get detailed information about a specific cluster
pulsar-admin clusters get us-west
# Output will show broker URLs, service URLs, etc.
# {
# "clusterName": "us-west",
# "serviceUrl": "pulsar://broker-1.us-west.pulsar.local:6650",
# "serviceUrlTls": "pulsar+ssl://broker-1.us-west.pulsar.local:6651",
# "brokerAdminUrl": "http://broker-admin-1.us-west.pulsar.local:8080",
# "brokerAdminUrlTls": "https://broker-admin-1.us-west.pulsar.local:8443",
# "authenticationEnabled": false,
# "authorizationEnabled": false
# }
This pulsar-admin clusters get command is your window into the operational state and configuration of an entire Pulsar deployment. It’s not just about listing them; it’s about understanding their network endpoints, security configurations, and administrative interfaces.
The core problem Pulsar clusters solve is providing a manageable, fault-tolerant, and scalable messaging infrastructure. By segmenting your deployment into clusters, you achieve several critical goals:
- Isolation: A failure or performance degradation in one cluster (e.g.,
us-west) doesn’t impact another (e.g.,eu-central). This is paramount for high availability and business continuity. - Geo-Replication: Clusters are the building blocks for multi-datacenter deployments. You configure replication between clusters to ensure data availability across geographic regions.
- Administration Granularity: You can apply different policies (authentication, authorization, quotas, message TTLs) to individual clusters, tailoring them to specific application needs or compliance requirements.
- Scalability: As your messaging load grows, you can add more brokers and bookies to existing clusters or deploy entirely new clusters to distribute the load.
Internally, each Pulsar cluster is a self-contained unit. It has its own ZooKeeper ensemble (or a shared one, though independent is best for isolation) for metadata management and its own fleet of brokers and bookies. The pulsar-admin tool, when interacting with clusters, communicates with the admin API endpoints of the brokers within the specified cluster. This is why the brokerAdminUrl is so critical. The CLI doesn’t talk to a central "cluster manager"; it talks to the brokers themselves, which collectively manage the cluster’s state.
When you create a topic, like persistent://my-tenant/my-namespace/my-topic, the cluster designation comes from the tenant’s configuration. Pulsar’s multi-tenancy model allows tenants to be associated with one or more clusters. If a tenant is configured for us-west, any topic created under that tenant will reside and be managed by the brokers in the us-west cluster.
# Create a new cluster (requires configuration on brokers and ZooKeeper)
# This is a simplified conceptual command; actual cluster creation involves
# deploying Pulsar components and configuring ZooKeeper.
# pulsar-admin clusters create us-east \
# --broker-service-url pulsar://broker-1.us-east.pulsar.local:6650 \
# --admin-service-url http://broker-admin-1.us-east.pulsar.local:8080
# Configure a tenant to use a specific cluster
pulsar-admin tenants update my-tenant --allowed-clusters us-west,eu-central
# Create a topic in a tenant that is allowed in multiple clusters
# Pulsar will create this topic on brokers in *both* us-west and eu-central
# if geo-replication is configured between them.
pulsar-admin topics create persistent://my-tenant/my-namespace/shared-topic
The most surprising thing about how clusters interact for geo-replication is that it’s not a "pull" mechanism from a central coordinator, but rather a "push" or "sync" managed by the brokers themselves based on configuration. When a message is published to a topic in a source cluster, the brokers in that cluster, if configured for replication to a target cluster, will serialize and send that message to the corresponding topic’s brokers in the target cluster. This process is driven by topic-level or namespace-level replication policies, not by a global cluster orchestrator. The pulsar-admin clusters set-replication-policies command is what enables this, but the actual data flow is a peer-to-peer (broker-to-broker) sync.
Understanding clusters is fundamental to managing Pulsar at scale, especially when you move beyond a single-datacenter deployment. The next logical step after mastering cluster administration is diving into namespace management, which builds upon the cluster foundation to provide finer-grained control over topics.