The Pulsar broker is timing out while waiting for acknowledgment from a Pulsar client, meaning the client isn’t confirming message consumption fast enough.

This usually happens because the client is overwhelmed, the network between client and broker is congested, or the broker itself is struggling to keep up with message processing.

Common Causes and Fixes

  1. Client Consumer Lag: The consumer application is processing messages slower than the broker is sending them.

    • Diagnosis: Check consumer lag using pulsar-admin topics stats <topic-name>. Look for a steadily increasing messageAckRate or messageReceivedRate compared to messageAckRate.
    • Fix: Increase the number of consumer instances, optimize the consumer’s message processing logic (e.g., batching, parallel processing), or scale up the resources of the consumer instances.
    • Why it works: More processing power or a more efficient processing loop allows the consumer to acknowledge messages faster, preventing the broker from timing out.
  2. Network Latency/Packet Loss: High latency or packet loss between the broker and the consumer prevents acknowledgment packets from reaching the broker in time.

    • Diagnosis: Use ping and traceroute from the broker to the consumer (and vice-versa) to check latency and identify network hops with high latency or packet loss. Monitor network interface statistics on both broker and consumer for errors or dropped packets.
    • Fix: Optimize network routes, upgrade network hardware, or ensure sufficient bandwidth. If consumers are in a different region or availability zone, consider co-locating them closer to the broker.
    • Why it works: Reliable and fast network communication ensures acknowledgments arrive at the broker within the configured timeout period.
  3. Broker Overload: The broker is too busy with other tasks (e.g., serving many topics, high write load, other consumers) to respond to acknowledgment requests promptly.

    • Diagnosis: Monitor broker metrics like CPU usage, memory usage, and request latency. Check the DispatchingMessagesRate and AckReceivedRate in broker-stats.json. High CPU or sustained high dispatch rates can indicate overload.
    • Fix: Scale up the number of brokers in the cluster, offload some topics to different brokers, or reduce the write load on the brokers.
    • Why it works: Distributing the load across more broker instances or reducing the overall workload frees up resources for each broker to handle acknowledgments more efficiently.
  4. Large Message Batching on Client: If the client is configured to acknowledge messages in large batches, and a single message within that batch is problematic (e.g., causes the consumer to hang), the entire batch acknowledgment can be delayed.

    • Diagnosis: Examine the consumer application’s acknowledgment logic. If it’s batching acknowledgments, check for any message content or processing logic that might cause a single message to stall processing for an extended period.
    • Fix: Reduce the receiverQueueSize and maxAckBatchingMessages on the consumer. Consider processing and acknowledging messages individually if a single message can cause downstream issues.
    • Why it works: Smaller acknowledgment batches mean that a stalled message only affects a small group of messages, allowing other messages to be acknowledged promptly.
  5. Broker Configuration - Acknowledgment Timeout: The brokerClient.acknowledgement-timeout-ms configuration parameter on the broker is set too low for the observed network conditions and consumer processing speed.

    • Diagnosis: Review pulsar-broker.conf or standalone.conf for the brokerClient.acknowledgement-timeout-ms setting.
    • Fix: Increase brokerClient.acknowledgement-timeout-ms in pulsar-broker.conf on all brokers. For example, set it to 60000 (60 seconds). Restart the brokers.
    • Why it works: A longer timeout allows the broker to wait for acknowledgments from slower consumers or over slower networks without prematurely marking them as timed out.
  6. ZooKeeper/BookKeeper Issues: If the underlying metadata store (ZooKeeper) or the ledger storage (BookKeeper) is experiencing performance issues or connectivity problems, it can indirectly impact the broker’s ability to process requests, including acknowledgments.

    • Diagnosis: Check the health and performance metrics of your ZooKeeper and BookKeeper clusters. Look for high latency, low throughput, or connection errors.
    • Fix: Address any performance bottlenecks or connectivity issues in ZooKeeper or BookKeeper. This might involve scaling up the ZooKeeper/BookKeeper clusters or optimizing their configurations.
    • Why it works: A healthy and responsive ZooKeeper and BookKeeper are fundamental for Pulsar’s operation, ensuring that internal operations, including acknowledgment processing, can complete successfully.

After resolving these issues, you might encounter Topic is partitioned errors if your topic was previously non-partitioned and you’re now dealing with a scaled-up partitioned topic for the first time.

Want structured learning?

Take the full Pulsar course →