The Pulsar Transaction Coordinator (TC) is failing to start because the ZooKeeper ensemble it relies on for metadata storage is unavailable or misconfigured.

Here are the most common reasons this happens and how to fix them:

ZooKeeper Ensemble Not Running

Diagnosis: Check the status of your ZooKeeper nodes. If you’re using systemd, this would be sudo systemctl status zookeeper. For other init systems, adapt accordingly. You should see all your ZooKeeper instances reporting as active and running.

Fix: If ZooKeeper isn’t running, start it on each node: sudo systemctl start zookeeper. This ensures the distributed coordination service is available for Pulsar components to register and communicate.

Why it works: Pulsar components, including the Transaction Coordinator, use ZooKeeper to discover and coordinate with each other. If ZooKeeper is down, they can’t establish their presence or find other necessary services.

Incorrect ZooKeeper Connection String in Pulsar Configuration

Diagnosis: Examine the Pulsar configuration file, typically conf/broker.conf or conf/standalone.conf, for the zookeeperServers parameter. Verify that the list of ZooKeeper hostnames and ports exactly matches your running ZooKeeper ensemble. For example, it should look like zookeeperServers=zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181.

Fix: Edit the zookeeperServers line in your Pulsar configuration to reflect the correct ZooKeeper endpoints. Restart the Pulsar broker(s) after making this change.

Why it works: This parameter tells the Pulsar broker (and thus the TC which runs as part of the broker) how to find and connect to ZooKeeper. An incorrect string means the broker can’t locate the essential metadata service.

ZooKeeper Ensemble is Unhealthy (e.g., Majority Quorum Lost)

Diagnosis: Check the ZooKeeper logs for errors related to quorum loss, leader election failures, or network partitions between ZooKeeper nodes. You can often find these in /var/log/zookeeper/zookeeper.log or similar paths. Look for messages like "This is not the leader" or "Could not connect to ZooKeeper server" originating from within the ZooKeeper cluster itself.

Fix: This is more complex and depends on the root cause. It might involve restarting a minority of ZooKeeper nodes that are lagging or causing partitions, or troubleshooting network connectivity between the ZooKeeper nodes. Ensure that a majority of ZooKeeper nodes can communicate with each other. For a 3-node ensemble, at least 2 must be up; for a 5-node ensemble, at least 3.

Why it works: ZooKeeper requires a majority of its nodes (a quorum) to be operational and in agreement to function correctly. If this quorum is lost, the entire ZooKeeper service becomes unavailable, preventing Pulsar from initializing.

Firewall Blocking ZooKeeper Ports

Diagnosis: Ensure that no firewalls (e.g., iptables, firewalld, or cloud provider security groups) are blocking the ZooKeeper client port (default 2181) or the ZooKeeper peer port (default 2888 and 3888) between the Pulsar broker nodes and the ZooKeeper nodes. Use telnet or nc from the Pulsar broker machine to the ZooKeeper nodes on these ports.

Fix: Update firewall rules to allow traffic on ports 2181, 2888, and 3888 between your Pulsar broker and ZooKeeper servers. For example, with firewalld: sudo firewall-cmd --permanent --add-port=2181/tcp and sudo firewall-cmd --reload.

Why it works: Network security can inadvertently isolate components. If Pulsar brokers cannot reach ZooKeeper on its required ports, they cannot establish the necessary connection for coordination.

ZooKeeper Data Directory Issues

Diagnosis: Check the ZooKeeper configuration file (e.g., zoo.cfg) for the dataDir parameter. Verify that this directory exists, has correct permissions for the ZooKeeper user, and is not full. Also, check the dataLogDir if it’s configured separately.

Fix: Ensure the dataDir and dataLogDir exist, are writable by the ZooKeeper process, and have sufficient free disk space. You might need to clear old transaction logs or expand disk capacity.

Why it works: ZooKeeper uses its data directory to store its transaction logs and snapshots, which are crucial for its operation and recovery. If this directory is inaccessible or full, ZooKeeper cannot function, impacting Pulsar.

Incorrect ZooKeeper Authentication/Authorization

Diagnosis: If ZooKeeper is configured with authentication (e.g., SASL) or authorization, ensure that the Pulsar broker’s client principal or credentials are correctly configured in broker.conf (e.g., zookeeperClientCnxnSocket, zookeeperSaslClientEnable, zookeeperSaslJaasClientConf) and that the ZooKeeper server is configured to allow connections from this principal. Check ZooKeeper server logs for authentication/authorization denial messages.

Fix: Align the authentication and authorization configurations between Pulsar and ZooKeeper. This often involves correctly setting up JAAS configuration files and ensuring the Pulsar broker’s identity is recognized and permitted by ZooKeeper.

Why it works: Security measures can prevent legitimate connections if not configured in sync. If Pulsar’s attempt to connect to ZooKeeper is rejected due to authentication or authorization failures, the TC will not be able to start.

The next error you’ll likely encounter if all these are resolved is related to the Transaction Coordinator’s internal state or its ability to access Pulsar’s topic metadata, often manifesting as errors related to ledger creation or bookkeeper client initialization.

Want structured learning?

Take the full Pulsar course →