Pulsar’s ZooKeeper ensemble is choking on metadata operations, preventing the system from scaling.

Here’s why and how to fix it.

The Problem: Pulsar relies on ZooKeeper for critical metadata management: broker registration, topic discovery, topic metadata, schema information, and more. When these operations become too frequent or too heavy, ZooKeeper becomes the bottleneck, leading to high latency, missed heartbeats, and ultimately, Pulsar service degradation.

Common Causes and Fixes:

  1. Excessive Topic Creation/Deletion: Rapidly creating and deleting topics, especially in large numbers, bombards ZooKeeper with create and delete operations.

    • Diagnosis: Monitor ZooKeeper’s mntr metric, specifically zk_num_alive_connections and zk_outstanding_requests. High values here, correlated with topic churn, point to this issue. Also, check Pulsar broker logs for "topic creation failed" or "topic deletion failed" messages.
    • Fix: Implement a topic lifecycle management strategy. Avoid ephemeral topic creation for short-lived tasks. Use Pulsar’s admin-client to list topics and observe creation/deletion rates. If possible, reduce the frequency of topic lifecycle operations. Consider using partitioned topics if many similar logical topics are needed, as this reduces the metadata overhead per topic.
    • Why it works: Fewer distinct metadata entries mean fewer ZooKeeper operations.
  2. Large Numbers of Topics/Partitions: Even static, large numbers of topics and partitions can strain ZooKeeper. Each topic and partition is a node in ZooKeeper’s hierarchy, and Pulsar needs to be aware of them.

    • Diagnosis: Use ls /admin/persistent/namespace/topic/partition (replace with your namespace) via zkCli.sh to count the number of topic nodes. A very large number (tens of thousands or more) is a red flag.
    • Fix: Review your topic naming conventions and partitioning strategy. Consolidate where possible. For very large numbers of topics, consider if all are actively used and necessary. For partitioned topics, ensure you’re not over-partitioning.
    • Why it works: Reduces the total number of nodes ZooKeeper must manage and traverse.
  3. ZooKeeper Network Latency/Instability: High latency or packet loss between Pulsar brokers/clients and ZooKeeper, or among ZooKeeper nodes themselves, drastically increases operation times.

    • Diagnosis: Run ping and mtr from brokers to ZooKeeper nodes. Monitor zk_server_state for leader and follower states and zk_last_zxid_seen for followers to detect lagging nodes. Check network interface error counters (ifconfig or ip -s link).
    • Fix: Ensure ZooKeeper nodes and Pulsar brokers are in the same low-latency network segment. Optimize network configurations (e.g., MTU sizes). If using cloud providers, ensure your VPC/subnet routing is efficient.
    • Why it works: Faster network communication means ZooKeeper operations complete quicker, reducing queue build-up.
  4. ZooKeeper Configuration Issues (Java Heap, File Descriptors): Insufficient Java heap space or too few file descriptors can cause ZooKeeper to slow down or crash.

    • Diagnosis: Check ZooKeeper’s garbage collection logs for frequent or long pauses. Monitor ulimit -n for the ZooKeeper process. Use jstat -gcutil <pid> to observe heap usage and GC activity.
    • Fix: Increase the ZooKeeper Java heap size (e.g., JVMFLAGS="-Xms4g -Xmx8g" in zookeeper-server-start.sh). Increase the open file descriptor limit (ulimit -n 65536) for the ZooKeeper user.
    • Why it works: More memory allows ZooKeeper to cache more data and perform GC less frequently. More file descriptors allow it to handle more concurrent connections and open files (like transaction logs).
  5. ZooKeeper Transaction Log Disk I/O: Slow disk I/O on the devices hosting ZooKeeper’s transaction logs (dataDir) will directly impact write performance, which is critical for ZooKeeper’s consistency guarantees.

    • Diagnosis: Monitor disk I/O metrics (IOPS, throughput, latency) on the ZooKeeper data directories using tools like iostat or cloud provider monitoring. High latency or low IOPS indicate a problem.
    • Fix: Use fast SSDs or NVMe drives for ZooKeeper’s dataDir. Ensure the filesystem is configured for optimal performance (e.g., noatime mount option). Separate dataDir and dataLogDir onto different physical devices if possible.
    • Why it works: Faster writes to the transaction log allow ZooKeeper to acknowledge operations more quickly.
  6. Too Many Watchers/Subscriptions: While less common than the above, if clients or Pulsar components are setting an excessive number of watches on ZooKeeper nodes, it can consume ZooKeeper resources.

    • Diagnosis: Monitor zk_num_int_connections and zk_avg_latency in mntr. A large number of connections with high latency could indicate watcher overhead. ZooKeeper’s stat command can show watch_count.
    • Fix: Review Pulsar’s configuration and any custom clients interacting with ZooKeeper. Ensure watches are only used where absolutely necessary and are properly cleaned up.
    • Why it works: Reduces the overhead on ZooKeeper for tracking and notifying clients about changes.

The Next Hurdle: Once ZooKeeper is stable, you might encounter Pulsar broker resource contention as they now have the capacity to process more requests.

Want structured learning?

Take the full Pulsar course →