The Pulsar broker failed to locate a subscription on a topic because the subscription’s metadata was lost or never properly registered.
This usually happens because of a race condition during subscription creation, a broker restart before metadata was persisted, or a problem with the ZooKeeper ensemble that stores this metadata.
Cause 1: Subscription Creation Race Condition
- Diagnosis: Check your Pulsar client logs for
SubscriptionNotFoundExceptionor similar errors occurring immediately after you believe you’ve created a subscription. Concurrently, check the Pulsar broker logs for messages indicating issues withTopicorSubscriptionobject creation. - Fix: Implement retry logic in your Pulsar client’s subscription creation process. A common pattern is to wait a short, exponential backoff period (e.g., 100ms, 200ms, 400ms) and retry the
subscribeoperation up to 3-5 times. - Why it works: This gives the broker enough time to fully initialize the topic and subscription metadata in ZooKeeper before the client attempts to attach to it, preventing the client from trying to subscribe to something that doesn’t quite exist yet from its perspective.
Cause 2: Broker Restart Before Metadata Persistence
- Diagnosis: If you’ve recently restarted a Pulsar broker and immediately started seeing
SubscriptionNotFoundExceptionfor subscriptions that were known to exist, this is a strong indicator. Check broker logs for messages about shutting down cleanly or losing connection to ZooKeeper during shutdown. - Fix: Ensure your Pulsar brokers are configured to persist metadata to ZooKeeper with a high level of durability. Specifically, check
bookkeeper.conf(if using BookKeeper for ledger storage, which is common) for settings likejournalSyncData=trueand ZooKeeper configuration forsyncEnabled=true. Inpulsar.conf, ensurezookeeperServersis correctly set and that brokers have stable connectivity. - Why it works: By forcing synchronous writes to persistent storage for metadata operations (like subscription creation), you guarantee that the subscription’s existence is recorded in ZooKeeper before the broker shuts down or loses its in-memory state.
Cause 3: ZooKeeper Ensemble Issues
- Diagnosis: Monitor the health of your ZooKeeper ensemble. Look for leader election disruptions, high latency between ZooKeeper nodes, or dropped connections in the ZooKeeper logs. Use
zkCli.shto check the status of your ensemble (/opt/kafka/bin/zkServer.sh statusif using Kafka’s ZooKeeper). - Fix: Address the underlying ZooKeeper instability. This might involve restarting ZooKeeper nodes, increasing timeouts, ensuring network connectivity between nodes, or migrating to a more robust ZooKeeper setup (e.g., more nodes, dedicated hardware). Ensure Pulsar’s
zookeeperServersconfiguration points to healthy, reachable ZooKeeper instances. - Why it works: Pulsar relies heavily on ZooKeeper for all its metadata, including topic partitioning, broker discovery, and subscription information. If ZooKeeper is unstable, these metadata operations will fail, leading to subscription lookup errors.
Cause 4: Topic Deletion/Recreation Race
- Diagnosis: If you’re seeing
SubscriptionNotFoundExceptionshortly after a topic was potentially deleted and then recreated, this is the culprit. Check broker logs for topic deletion events and subsequent topic creation events. - Fix: Implement a grace period or a more robust topic lifecycle management strategy. Avoid rapid delete-then-create operations on topics that have active subscriptions. If recreation is necessary, ensure all subscriptions are explicitly detached or deleted before the topic itself is deleted.
- Why it works: When a topic is deleted, its associated metadata, including subscription information, is also removed from ZooKeeper. If the topic is recreated too quickly, the new topic instance won’t have the old subscription metadata, and clients will see it as not found.
Cause 5: Incorrect Topic Name or Namespace
- Diagnosis: Double-check the exact topic name and namespace being used by the client attempting to subscribe. Pay close attention to case sensitivity, leading/trailing spaces, and correct tenant/namespace structure (e.g.,
persistent://public/default/my-topic). - Fix: Ensure consistency. The topic name and namespace used in the client’s
subscribecall must exactly match the name and namespace under which the topic was created and is managed by Pulsar. - Why it works: Pulsar uses fully qualified topic names (including tenant and namespace) as keys for its metadata. A mismatch means the broker is looking for metadata that simply doesn’t exist under the provided identifier.
Cause 6: Pulsar Tenant/Namespace Configuration Issues
- Diagnosis: Verify that the tenant and namespace for the topic exist and are correctly configured in Pulsar. You can check this using
pulsar-admincommands likepulsar-admin tenants listandpulsar-admin namespaces list <tenant_name>. - Fix: Create the necessary tenant and namespace using
pulsar-admin tenants create <tenant_name>andpulsar-admin namespaces create <tenant_name>/<namespace_name>. Ensure any authentication/authorization policies are also correctly set if applicable. - Why it works: Topics are organized within tenants and namespaces. If a namespace doesn’t exist, Pulsar cannot create or manage topics within it, and subsequently, any subscriptions associated with such a topic will not be found.
The next error you’ll likely encounter if all subscriptions are correctly found but messages aren’t being delivered is No brokers available for topic.