Protobuf isn’t just faster and smaller than MessagePack; it actually forces you to define your data structure upfront, which is its secret weapon.

Let’s see it in action. Imagine we’re building a simple chat application and need to send messages.

First, we define our message structure using Protobuf:

// chat.proto
syntax = "proto3";

message ChatMessage {
  string sender_id = 1;
  string recipient_id = 2;
  string message_text = 3;
  int64 timestamp = 4;
}

We compile this .proto file into code for our chosen language (say, Python):

protoc --python_out=. chat.proto

This generates a chat_pb2.py file. Now, we can create and serialize a message:

import chat_pb2
import time

message = chat_pb2.ChatMessage()
message.sender_id = "user123"
message.recipient_id = "user456"
message.message_text = "Hello there!"
message.timestamp = int(time.time())

serialized_message = message.SerializeToString()
print(f"Serialized Protobuf message (bytes): {serialized_message}")
print(f"Size: {len(serialized_message)} bytes")

Output might look like:

Serialized Protobuf message (bytes): b'\n\x07user123\x12\x07user456\x1a\x0bHello there!\x20\x8a\x96\x0c'
Size: 36 bytes

Now, let’s do the same with MessagePack. MessagePack is schema-less, meaning we don’t define a structure beforehand. We just pack data.

import msgpack
import time

data = {
    "sender_id": "user123",
    "recipient_id": "user456",
    "message_text": "Hello there!",
    "timestamp": int(time.time())
}

serialized_message_mp = msgpack.packb(data, use_bin_type=True)
print(f"Serialized MessagePack message (bytes): {serialized_message_mp}")
print(f"Size: {len(serialized_message_mp)} bytes")

Output might look like:

Serialized MessagePack message (bytes): b'\x84\xa9sender_id\xa7user123\xaa\x0crecipient_id\xa7user456\xab\x0cmessage_text\x0bHello there!\x08\x00timestamp\xc8\x8a\x96\x0c'
Size: 69 bytes

Notice the size difference? Protobuf is significantly smaller. This is because it doesn’t include field names or types in the serialized output; it uses numeric tags defined in the .proto file. MessagePack, being schema-less, includes these keys.

The core problem Protobuf solves is reliable, efficient data interchange between systems that might be written in different languages. It provides a contract for your data. MessagePack, on the other hand, is excellent for situations where flexibility and ease of use are paramount, and you don’t need strict schema enforcement or extreme efficiency.

Internally, Protobuf serializes data using a tag-based system. Each field in your .proto file is assigned a unique number (the tag). When you serialize, the encoder writes the tag and then the encoded value. For primitive types, it uses efficient binary encodings like Varints for integers. For example, the timestamp 1678886400 (which is 0x63142940 in hex) might be encoded as a Varint, which is a variable-length encoding where smaller numbers use fewer bytes. The tag 4 (which is 0x20 in hex) is also encoded as a Varint. So, the timestamp field might become \x20\x8a\x96\x0c, where \x20 is the tag and \x8a\x96\x0c is the Varint encoding of the timestamp. This is why it’s so compact.

MessagePack, conversely, is a binary format that is conceptually similar to JSON but on the wire. It has specific byte codes for different data types (e.g., 0x84 for a map with 4 elements, 0xa9 for a string of length 9). It packs these type codes and lengths along with the raw data. For our example, sender_id becomes \xa9sender_id (0xa9 means string of length 9).

A key aspect of Protobuf’s efficiency is its use of wire types. These tell the decoder how to interpret the subsequent bytes. For instance, a Varint is type 0, a 64-bit integer is type 1, a length-delimited field (like a string or nested message) is type 2, and a group start/end is type 3. The encoder knows the wire type for each field based on its definition in the .proto file, and it writes the tag (field number + wire type) and then the data. This explicit encoding scheme allows for very fast parsing and minimal overhead.

The one thing most developers miss when comparing these two is that Protobuf’s schema definition isn’t just for validation; it’s what allows it to achieve its incredible efficiency. By knowing the field number and expected type before parsing, the deserializer can skip over data it doesn’t need or parse it with specialized, highly optimized routines, rather than inspecting the data itself to determine its type and length on the fly, as MessagePack often does.

The next step after mastering binary serialization is often exploring how to handle schema evolution and versioning for Protobuf messages.

Want structured learning?

Take the full Protobuf course →