Protobuf and Thrift, despite their similar goals, approach IDL serialization with fundamentally different philosophies, leading to surprising trade-offs.
Let’s see how this plays out in practice. Imagine you have a simple User message with an id (integer) and name (string).
Protobuf Definition (user.proto):
syntax = "proto3";
message User {
int32 id = 1;
string name = 2;
}
When you compile this with protoc, you get code for various languages. A serialized Protobuf message for User{id: 123, name: "Alice"} looks something like this (conceptually, not raw bytes):
\x08\x7b\x12\x05Alice
\x08: Tag for field 1 (wire type varint).\x7b: The varint representation of 123.\x12: Tag for field 2 (wire type length-delimited).\x05: Length of the string "Alice" (5 bytes).Alice: The UTF-8 encoded string.
Thrift Definition (user.thrift):
struct User {
1: i32 id;
2: string name;
}
A serialized Thrift message for the same User{id: 123, name: "Alice"} (using Compact Protocol) would be:
\x0b\x01\x7b\x0c\x05Alice
\x0b: Field type (i32) and field ID (1).\x01: The integer 123 (compact encoding).\x0c: Field type (string) and field ID (2).\x05: Length of the string "Alice" (5 bytes).Alice: The UTF-8 encoded string.
The core problem both solve is efficient data exchange between services without relying on text-based formats like JSON or XML. They achieve this by using an Interface Definition Language (IDL) to define data structures and then generating code for serialization and deserialization.
Protobuf, developed by Google, prioritizes simplicity and speed. Its core idea is to encode data with minimal overhead. Fields are identified by numeric tags, and the wire type (e.g., varint, fixed 32, length-delimited) dictates how the value is encoded. This results in very compact messages.
Thrift, originating from Facebook, offers more flexibility and a wider range of data types and protocols. It also uses numeric IDs but has a more structured approach, often involving more metadata within the serialized data itself, especially with protocols like Binary Protocol. Compact Protocol is Thrift’s answer to Protobuf’s efficiency.
Let’s dig into the internal workings. Protobuf’s proto3 syntax is designed for forward and backward compatibility. When you add a new field, existing code can still parse older messages, and new code can parse messages with new fields. The numeric tags are crucial here. If a field is missing, it’s simply not present in the serialized data.
Thrift’s struct definition is similar, but its protocols can vary. The BinaryProtocol is more verbose, including field type and ID for every field. The CompactProtocol tries to mimic Protobuf’s efficiency by omitting redundant information, like field type for consecutive fields of the same type, and using delta encoding for field IDs.
The key differentiator is how they handle schema evolution. Protobuf’s tag-based system is inherently robust. If you remove a field, old clients won’t see it. If you add a field with a new tag, old clients will ignore it. If you rename a field, as long as you keep the same tag number, compatibility is maintained.
Thrift also supports schema evolution, but its protocols can have different behaviors. With CompactProtocol, renaming a field might break compatibility if the field ID isn’t preserved, as the protocol uses field IDs to reconstruct the data. However, Thrift’s richer type system allows for more complex scenarios, like defining unions and enums with specific behaviors during evolution.
The most surprising aspect is how Protobuf’s "lack" of explicit type information in the serialized payload (beyond the wire type) makes it incredibly dense, while Thrift’s CompactProtocol achieves similar density by being clever about repeating information. For instance, in Thrift’s Compact Protocol, if you have multiple consecutive string fields, the field type byte isn’t repeated for each subsequent string; it’s inferred. Protobuf, conversely, always encodes the tag, regardless of what came before. This "simplicity" in Protobuf’s encoding is what allows its messages to be so small.
The next concept you’ll grapple with is choosing the right protocol for Thrift, as the serialization efficiency and overhead can vary dramatically.