Pulsar doesn’t actually support messages larger than 10MB by default, which is a bit of a surprise given how often you hit that limit.
Let’s see Pulsar chunking in action. Imagine you have a Pulsar topic, persistent://public/default/large-messages, and you want to send a 15MB message to it.
# First, create a producer that's configured to handle large messages
# We'll set a large chunking threshold and a high max message size
broker_url="pulsar://localhost:6650"
topic="persistent://public/default/large-messages"
message_size_mb=15
message_content=$(head -c $((message_size_mb * 1024 * 1024)) /dev/urandom | base64)
# Using the Pulsar client CLI for demonstration
# In a real application, you'd configure this in your producer code
# For Java:
# Producer<byte[]> producer = pulsarClient.newProducer()
# .topic(topic)
# .chunkingEnabled(true) // Explicitly enable chunking
# .maxPendingMessages(10000) // Can be tuned
# .create();
# producer.send(messageContent.getBytes());
# For Go:
# producer, err := client.CreateProducer(pulsar.ProducerOptions{
# Topic: topic,
# // Chunking is enabled by default if message size exceeds threshold
# // You can also explicitly set it if needed
# })
# producer.Send(ctx, &proto.Message{Payload: messageContentBytes})
# For Python:
# producer = client.create_producer(topic, enable_chunking=True)
# producer.send(message_content.encode('utf-8'))
echo "Simulating sending a ${message_size_mb}MB message to ${topic}"
echo "Actual sending would happen via client library with chunking enabled."
echo "The Pulsar broker will automatically handle the fragmentation and reassembly."
# To verify, we can subscribe and observe
# In a real scenario, the consumer would receive a single logical message
# regardless of the number of chunks.
# For demonstration, we'll just show a placeholder consumer.
echo "Consumer side will receive a single message object, unaware of the internal chunking."
The problem Pulsar chunking solves is the inherent limitation of message brokers to handle arbitrarily large individual messages efficiently. Network protocols, memory buffers, and serialization/deserialization all have practical limits. If you try to send a 15MB message without chunking, it’s likely to fail with an error like MessageTooLargeException or a timeout as the broker struggles to process it.
Chunking breaks down a large message into smaller, manageable pieces (chunks) at the producer side. Each chunk is sent as a regular Pulsar message, but with special metadata indicating it’s part of a larger, fragmented message. The Pulsar broker, upon receiving these chunks, doesn’t immediately deliver them to consumers. Instead, it stores them and waits for all chunks belonging to the same original message to arrive. Once all chunks are received, the broker reassembles them into the original large message and then delivers that single, complete message to the consumer. The consumer client library handles this reassembly transparently, so your application code sees only one large message.
The key levers you control are primarily on the producer side:
chunkingEnabled: This is the master switch. You must explicitly enable chunking in your producer configuration.maxMessageSize: While not directly controlling chunking, this sets the absolute maximum size a single message (or a chunk) can be. If a message exceeds themessageSizeJumpThreshold(see below) and is still larger than this, it will fail. This is usually set very high, often to the same value as the configured Pulsar brokermaxMessageSize(e.g., 100MB or more).messageSizeJumpThreshold: This is the threshold that triggers chunking. If a message’s size exceeds this value, Pulsar will start breaking it into chunks. A common setting is around 1MB or 10MB, depending on your typical message sizes and tolerance for overhead. The default is often 10MB, but it’s good practice to set it explicitly if you’re dealing with large messages.
When chunking is enabled, Pulsar automatically determines the optimal chunk size. It aims to keep chunks below the broker’s configured maxMessageSize while also ensuring that the total number of chunks for a single message remains manageable. The broker’s maxMessageSize configuration (e.g., maxMessageSize: 104857600 for 100MB in broker.conf) is crucial because it dictates the upper bound for each individual chunk.
The most surprising aspect of Pulsar’s chunking implementation is how it handles message IDs. When a large message is chunked, the producer assigns a single unique message ID to the original large message. All the individual chunks that form this message will share this same base message ID, but each chunk will have a unique sequence number appended. The broker and consumer clients use this combination of base ID and sequence number to track and reassemble the message. This means that even though multiple network packets (chunks) are sent, the consumer application sees a single message with a single logical ID.
The next hurdle you’ll likely encounter after successfully implementing large message chunking is understanding how Pulsar handles message acknowledgments (acks) in relation to chunked messages.