Pulsar namespaces don’t just group topics; they’re the primary mechanism for enforcing granular operational policies across sets of topics.
Let’s see this in action. Imagine we have a finance tenant with two namespaces: trading and reporting. We want trading to have a short message retention period and strict deduplication, while reporting can keep messages longer and doesn’t need deduplication.
Here’s how we’d configure that using the Pulsar admin CLI:
First, let’s set policies for the trading namespace. We want messages to be automatically deleted after 15 minutes and deduplication enabled to prevent duplicate writes:
pulsar-admin namespaces set-message-ttl finance/trading --ttl 15m
pulsar-admin namespaces set-deduplication-policy finance/trading --allow-duplicates false --min-message- துணிவு 1000
The --ttl 15m command tells Pulsar to discard any message in the trading namespace that hasn’t been consumed within 15 minutes. The deduplication policy, with --allow-duplicates false and --min-message- துணிவு 1000, ensures that Pulsar tracks message IDs for the last 1000 messages per topic within this namespace, discarding any message that arrives with an ID already seen.
Now, for the reporting namespace, we want longer retention and no deduplication overhead:
pulsar-admin namespaces set-message-ttl finance/reporting --ttl 168h # 7 days
pulsar-admin namespaces set-deduplication-policy finance/reporting --allow-duplicates true
Here, --ttl 168h sets retention to a full week, useful for batch reporting. --allow-duplicates true disables the deduplication checks, allowing all messages to be accepted without the overhead of checking for duplicates.
The core problem Pulsar namespaces solve is managing complexity at scale. Without them, you’d be setting policies on a per-topic basis, which quickly becomes unmanageable. Namespaces provide a hierarchical way to group related topics and apply consistent operational rules. They act as a policy boundary. When a client connects to a topic, Pulsar checks the policies associated with that topic’s namespace. These policies can include message TTL, deduplication, replication settings, and even authentication/authorization rules.
Internally, Pulsar stores these namespace policies in ZooKeeper (or the configured metadata store). When you execute an admin command like set-message-ttl, the Pulsar admin client updates the corresponding entry in ZooKeeper for that specific namespace. When a broker needs to enforce a policy (e.g., before acknowledging a message write for deduplication, or when deciding whether to delete a message based on TTL), it fetches the relevant policies from the metadata store. This distributed fetching means brokers can operate independently once policies are cached, but changes are reflected globally as brokers refresh their policy cache.
A key aspect of replication is also controlled at the namespace level. You can dictate how many copies of a message should exist across different data centers or availability zones. For example, to ensure high availability for the trading namespace, you might configure a replication factor of 3:
pulsar-admin namespaces set-replication-policy finance/trading --cluster us-west-1 --min-available 2 --max-unavailable 1
pulsar-admin namespaces set-replication-policy finance/trading --cluster us-east-1 --min-available 2 --max-unavailable 1
This tells Pulsar that for topics in finance/trading, it should aim to replicate messages to at least 2 brokers in us-west-1 and 2 in us-east-1, tolerating up to 1 unavailable broker in each cluster. This ensures that even if one broker or an entire availability zone goes down, data is still available and production can continue.
The surprise is how deeply these policies are integrated. It’s not just a metadata tag; it actively influences broker behavior for every single message. For instance, when a broker receives a message, it first checks the deduplication policy for the target namespace. If deduplication is enabled, it consults a local cache of message IDs. If the ID is already present, the message is dropped immediately, and an acknowledgment is sent back to the producer indicating it was a duplicate. This happens before the message is even written to storage. If it’s not a duplicate, the ID is added to the cache, and the message proceeds to the write path. This tight integration makes deduplication efficient but also means it consumes memory on the broker.
Beyond TTL, deduplication, and replication, you can also set backlog_quota on a namespace. This is a crucial mechanism to prevent runaway topic growth from consuming all broker disk space. You can set a hard limit on the total unacknowledged message backlog for all topics within a namespace. If this limit is reached, Pulsar will start rejecting new message writes until consumers catch up.
The next step in managing Pulsar at scale involves understanding how these namespace policies interact with tenant-level configurations and topic-specific overrides.