The Pulsar broker is failing to perform operations against ZooKeeper because the broker’s ZooKeeper client library is not correctly initialized or configured to connect to the ZooKeeper ensemble.

Common Causes and Fixes:

  1. Incorrect ZooKeeper Connection String in Pulsar Configuration:

    • Diagnosis: Check the conf/broker.conf (or conf/standalone.conf for standalone mode) file for the zookeeperServers parameter. Ensure it precisely matches your ZooKeeper ensemble’s connection string.
    • Example:
      zookeeperServers=zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181
      
    • Fix: Correct the zookeeperServers value in broker.conf to accurately list all ZooKeeper nodes and their ports.
    • Why it works: This parameter tells the Pulsar broker which ZooKeeper servers to attempt to connect to. An incorrect string means the broker cannot find or reach the ZooKeeper ensemble.
  2. ZooKeeper Ensemble Not Running or Unreachable:

    • Diagnosis: From the machine running the Pulsar broker, attempt to telnet or nc to each ZooKeeper server on its configured port (default 2181).
    • Example:
      telnet zk1.example.com 2181
      # or
      nc -vz zk1.example.com 2181
      
      Look for "Connected to" or "succeeded" messages.
    • Fix: Start the ZooKeeper ensemble if it’s not running, or troubleshoot network connectivity issues (firewalls, routing) preventing the broker from reaching the ZooKeeper nodes.
    • Why it works: Pulsar relies on ZooKeeper for metadata storage and coordination. If ZooKeeper is down or unreachable, the broker cannot perform any operations that require it.
  3. ZooKeeper Authentication/Authorization Misconfiguration:

    • Diagnosis: If your ZooKeeper has SASL authentication enabled, check the zookeeper.sasl.auth.enabled setting in conf/broker.conf and ensure the zookeeperClientCnxn.sasl.login.username and zookeeperClientCnxn.sasl.login.password (or equivalent properties for your authentication mechanism) are correctly set and the provided credentials are valid for ZooKeeper. Also, verify that the Pulsar ZooKeeper user has the necessary read/write permissions within ZooKeeper.
    • Fix: Update broker.conf with the correct SASL credentials or adjust ZooKeeper ACLs to grant the Pulsar user the required permissions. Restart the Pulsar broker.
    • Why it works: ZooKeeper’s security features prevent unauthorized access. If the broker’s credentials are wrong or it lacks permissions, ZooKeeper will reject its connection attempts.
  4. Incorrect ZooKeeper Client Port in Pulsar Configuration:

    • Diagnosis: Ensure the port specified in zookeeperServers matches the clientPort configured in the ZooKeeper server’s zoo.cfg file.
    • Fix: Update broker.conf’s zookeeperServers to use the correct client port for ZooKeeper, or update ZooKeeper’s clientPort if it’s non-standard.
    • Why it works: The broker needs to connect to the specific port ZooKeeper is listening on for client connections.
  5. ZooKeeper Ensemble Not Fully Started or Quorum Not Achieved:

    • Diagnosis: Check the ZooKeeper logs for messages indicating it’s stuck starting up, failing to elect a leader, or has lost quorum. Look for messages like "recovering leader," "could not find a leader," or "received connection request from a client, but ZooKeeper is not running."
    • Fix: Ensure all ZooKeeper nodes in the ensemble are running and have successfully elected a leader, maintaining quorum. This might involve restarting ZooKeeper nodes in the correct order or resolving underlying network issues between ZooKeeper nodes.
    • Why it works: ZooKeeper requires a majority of its nodes (quorum) to be operational and communicating to function correctly. If quorum is lost, it stops accepting client connections.
  6. ZooKeeper Session Timeout Issues:

    • Diagnosis: Examine Pulsar broker logs for repeated "ZooKeeper session expired" or similar errors. Also, check ZooKeeper server logs for "Connection broken" or "Session 0x… closed" messages.
    • Fix: Increase the zookeeperSessionTimeoutMs in broker.conf and the tickTime, initLimit, and syncLimit in ZooKeeper’s zoo.cfg to allow for longer timeouts and slower network conditions.
    • Why it works: If the network is unreliable or the broker is under heavy load, ZooKeeper sessions can time out. Increasing timeouts provides more buffer for communication.
  7. zookeeperMetadataStoreUrl in broker.conf Overrides zookeeperServers:

    • Diagnosis: If zookeeperMetadataStoreUrl is set in broker.conf, it takes precedence over zookeeperServers. Verify this URL is correct.
    • Example:
      zookeeperMetadataStoreUrl=zk://zk1.example.com:2181,zk2.example.com:2181/path/to/pulsar/root
      
    • Fix: Correct the zookeeperMetadataStoreUrl value or remove it if you intend to use zookeeperServers.
    • Why it works: This property provides an alternative way to specify the ZooKeeper connection, including the root path for Pulsar’s metadata. An incorrect URL here will lead to connection failures.

After resolving these issues, you will likely encounter TopicNotFoundException errors if Pulsar cannot find the necessary topics in ZooKeeper, indicating that the metadata itself might be missing or corrupted, or that the zookeeperRoot path in your configuration is incorrect.

Want structured learning?

Take the full Pulsar course →