The Pulsar broker is failing to perform operations against ZooKeeper because the broker’s ZooKeeper client library is not correctly initialized or configured to connect to the ZooKeeper ensemble.
Common Causes and Fixes:
-
Incorrect ZooKeeper Connection String in Pulsar Configuration:
- Diagnosis: Check the
conf/broker.conf(orconf/standalone.conffor standalone mode) file for thezookeeperServersparameter. Ensure it precisely matches your ZooKeeper ensemble’s connection string. - Example:
zookeeperServers=zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181 - Fix: Correct the
zookeeperServersvalue inbroker.confto accurately list all ZooKeeper nodes and their ports. - Why it works: This parameter tells the Pulsar broker which ZooKeeper servers to attempt to connect to. An incorrect string means the broker cannot find or reach the ZooKeeper ensemble.
- Diagnosis: Check the
-
ZooKeeper Ensemble Not Running or Unreachable:
- Diagnosis: From the machine running the Pulsar broker, attempt to
telnetorncto each ZooKeeper server on its configured port (default 2181). - Example:
Look for "Connected to" or "succeeded" messages.telnet zk1.example.com 2181 # or nc -vz zk1.example.com 2181 - Fix: Start the ZooKeeper ensemble if it’s not running, or troubleshoot network connectivity issues (firewalls, routing) preventing the broker from reaching the ZooKeeper nodes.
- Why it works: Pulsar relies on ZooKeeper for metadata storage and coordination. If ZooKeeper is down or unreachable, the broker cannot perform any operations that require it.
- Diagnosis: From the machine running the Pulsar broker, attempt to
-
ZooKeeper Authentication/Authorization Misconfiguration:
- Diagnosis: If your ZooKeeper has SASL authentication enabled, check the
zookeeper.sasl.auth.enabledsetting inconf/broker.confand ensure thezookeeperClientCnxn.sasl.login.usernameandzookeeperClientCnxn.sasl.login.password(or equivalent properties for your authentication mechanism) are correctly set and the provided credentials are valid for ZooKeeper. Also, verify that the Pulsar ZooKeeper user has the necessary read/write permissions within ZooKeeper. - Fix: Update
broker.confwith the correct SASL credentials or adjust ZooKeeper ACLs to grant the Pulsar user the required permissions. Restart the Pulsar broker. - Why it works: ZooKeeper’s security features prevent unauthorized access. If the broker’s credentials are wrong or it lacks permissions, ZooKeeper will reject its connection attempts.
- Diagnosis: If your ZooKeeper has SASL authentication enabled, check the
-
Incorrect ZooKeeper Client Port in Pulsar Configuration:
- Diagnosis: Ensure the port specified in
zookeeperServersmatches theclientPortconfigured in the ZooKeeper server’szoo.cfgfile. - Fix: Update
broker.conf’szookeeperServersto use the correct client port for ZooKeeper, or update ZooKeeper’sclientPortif it’s non-standard. - Why it works: The broker needs to connect to the specific port ZooKeeper is listening on for client connections.
- Diagnosis: Ensure the port specified in
-
ZooKeeper Ensemble Not Fully Started or Quorum Not Achieved:
- Diagnosis: Check the ZooKeeper logs for messages indicating it’s stuck starting up, failing to elect a leader, or has lost quorum. Look for messages like "recovering leader," "could not find a leader," or "received connection request from a client, but ZooKeeper is not running."
- Fix: Ensure all ZooKeeper nodes in the ensemble are running and have successfully elected a leader, maintaining quorum. This might involve restarting ZooKeeper nodes in the correct order or resolving underlying network issues between ZooKeeper nodes.
- Why it works: ZooKeeper requires a majority of its nodes (quorum) to be operational and communicating to function correctly. If quorum is lost, it stops accepting client connections.
-
ZooKeeper Session Timeout Issues:
- Diagnosis: Examine Pulsar broker logs for repeated "ZooKeeper session expired" or similar errors. Also, check ZooKeeper server logs for "Connection broken" or "Session 0x… closed" messages.
- Fix: Increase the
zookeeperSessionTimeoutMsinbroker.confand thetickTime,initLimit, andsyncLimitin ZooKeeper’szoo.cfgto allow for longer timeouts and slower network conditions. - Why it works: If the network is unreliable or the broker is under heavy load, ZooKeeper sessions can time out. Increasing timeouts provides more buffer for communication.
-
zookeeperMetadataStoreUrlinbroker.confOverrideszookeeperServers:- Diagnosis: If
zookeeperMetadataStoreUrlis set inbroker.conf, it takes precedence overzookeeperServers. Verify this URL is correct. - Example:
zookeeperMetadataStoreUrl=zk://zk1.example.com:2181,zk2.example.com:2181/path/to/pulsar/root - Fix: Correct the
zookeeperMetadataStoreUrlvalue or remove it if you intend to usezookeeperServers. - Why it works: This property provides an alternative way to specify the ZooKeeper connection, including the root path for Pulsar’s metadata. An incorrect URL here will lead to connection failures.
- Diagnosis: If
After resolving these issues, you will likely encounter TopicNotFoundException errors if Pulsar cannot find the necessary topics in ZooKeeper, indicating that the metadata itself might be missing or corrupted, or that the zookeeperRoot path in your configuration is incorrect.