The Pulsar HTTP lookup service isn’t just a way to find brokers; it’s the foundational mechanism for discovering the actual endpoints of your distributed Pulsar components, especially when dealing with dynamic environments or complex network setups.

Imagine a producer needing to send a message. It doesn’t know the broker’s IP address directly. It does know the Pulsar service URL (e.g., pulsar://localhost:6650). This service URL is then resolved by the Pulsar client library, which in turn uses the HTTP lookup service to find the correct broker endpoint for the topic it wants to publish to.

Let’s trace a simplified producer scenario:

  1. Client Initiates Connection: A Pulsar producer is configured with a service URL, say pulsar://pulsar.example.com:8080. The client library needs to connect to a broker.
  2. HTTP Lookup Request: The client library makes an HTTP GET request to the Pulsar lookup service endpoint, typically on port 8080. The request looks something like GET /admin/v2/persistent/public/default/my-topic/partitions. The pulsar.example.com:8080 part is crucial here.
  3. ZooKeeper/Metadata Lookup: The lookup service, which is usually a Pulsar broker itself acting in a lookup role, queries ZooKeeper (or the configured metadata store) to find out which broker is currently serving the partition for public/default/my-topic.
  4. Lookup Response: The lookup service returns an HTTP response containing the network address (e.g., pulsar://broker-0.pulsar.example.com:6650 or pulsar://192.168.1.10:6650) of the broker responsible for that topic’s partition.
  5. Direct Broker Connection: The client library then uses this returned address to establish a direct connection (on the broker’s native protocol port, typically 6650) to the assigned broker for actual message publishing or consumption.

This abstraction is powerful because it decouples the client from the underlying broker topology. Brokers can be added, removed, or scaled without clients needing to reconfigure their service URLs, as long as the lookup service endpoint remains stable.

The Core Problem This Solves:

Pulsar is a distributed system. Topics are partitioned, and partitions are managed by specific brokers. When a client wants to interact with a topic (publish, subscribe, consume), it needs to know which broker is currently the leader for the partition it’s interested in. The HTTP lookup service is the mechanism that bridges this gap. It provides a consistent, discoverable endpoint that clients can query to get the real-time, dynamic location of the broker serving a specific topic partition. This is especially vital in cloud-native environments where broker IPs and hostnames can change frequently due to auto-scaling, rolling updates, or ephemeral infrastructure.

Internal Mechanics and Control Levers:

The HTTP lookup functionality is an integrated part of Pulsar brokers. When a broker starts, it can be configured to act as a lookup service. The critical configuration parameter for this is serviceHttpUrl within the broker.conf or standalone.conf file.

  • serviceHttpUrl: This parameter specifies the URL that other Pulsar clients (or other brokers) will use to perform HTTP lookups. For instance, serviceHttpUrl=http://broker-0.pulsar.example.com:8080. If this is not set, the broker will try to infer it from advertisedListeners or listeners. It’s crucial that this URL is resolvable and accessible from where your clients are running.

  • listeners and advertisedListeners: These define the network interfaces and ports the broker listens on. listeners are what the broker actually binds to, while advertisedListeners are what the broker tells other components (including clients via lookup) to use. For successful lookups, the advertisedListeners configured for the broker that becomes the lookup service must match what clients expect to connect to, and the serviceHttpUrl must point to the HTTP portion of these advertised listeners.

  • ZooKeeper/Metadata Store Configuration: The lookup service relies on the metadata store (usually ZooKeeper) to map topic partitions to brokers. The zookeeperServers configuration is essential for the broker to connect to the metadata store and retrieve this information.

Example Configuration Snippet (broker.conf):

# The listeners for the broker to listen on.
# Example: PLAINTEXT://0.0.0.0:6650,SSL://0.0.0.0:6651
listeners=PLAINTEXT://0.0.0.0:6650

# The advertised listeners for the broker. This is what clients and other brokers will use.
# It should be resolvable from where clients are running.
advertisedListeners=PLAINTEXT://broker-0.pulsar.example.com:6650

# The HTTP URL for the broker's REST API and lookup service.
# This MUST be resolvable by clients performing lookups.
serviceHttpUrl=http://broker-0.pulsar.example.com:8080

# ZooKeeper connection string
zookeeperServers=zk-0:2181,zk-1:2181,zk-2:2181

When serviceHttpUrl is properly configured and accessible, Pulsar clients can reliably find the correct broker for any topic partition. If you’re using a load balancer in front of your brokers, the serviceHttpUrl should point to the load balancer’s address for the lookup service port.

The most surprising thing about Pulsar’s HTTP lookup is that any broker can act as a lookup service. It doesn’t have to be a dedicated "lookup broker." When a client queries for a topic, the lookup service broker consults the metadata store and returns the address of the broker currently owning that partition, even if that owning broker is different from the one that handled the lookup request. This distributed responsibility is key to Pulsar’s resilience and scalability.

The next hurdle you’ll likely encounter is understanding how Pulsar handles topic discovery and partition ownership changes in response to broker failures or scaling events.

Want structured learning?

Take the full Pulsar course →