Protobuf can be 6x smaller and 10x faster than JSON, but only if you’re serializing structured, repetitive data.
Let’s see it in action. Imagine we have a simple User message:
// user.proto
syntax = "proto3";
message User {
string name = 1;
int32 id = 2;
repeated string emails = 3;
}
We can compile this into Go code:
protoc --go_out=. user.proto
Now, let’s serialize and deserialize some user data in Go, comparing Protobuf and JSON.
package main
import (
"bytes"
"encoding/json"
"fmt"
"log"
"time"
"google.golang.org/protobuf/proto"
)
// Assume user.pb.go contains the generated Protobuf code for User
// type User struct {
// Name string `protobuf:"bytes,1,opt,name=name,proto3" json:"name,omitempty"`
// Id int32 `protobuf:"varint,2,opt,name=id,proto3" json:"id,omitempty"`
// Emails []string `protobuf:"bytes,3,rep,name=emails,proto3" json:"emails,omitempty"`
// XXX_NoUnkeyedLiteral struct{} `json:"-"`
// XXX_unrecognized []byte `json:"-"`
// XXX_sizecache int32 `json:"-"`
// }
func main() {
user := &User{
Name: "Alice",
Id: 12345,
Emails: []string{"alice@example.com", "alice.work@example.com"},
}
// --- Protobuf Serialization ---
startProtoMarshal := time.Now()
protoData, err := proto.Marshal(user)
if err != nil {
log.Fatalf("Protobuf marshal error: %v", err)
}
protoMarshalDuration := time.Since(startProtoMarshal)
fmt.Printf("Protobuf data size: %d bytes\n", len(protoData))
fmt.Printf("Protobuf marshal duration: %v\n", protoMarshalDuration)
// --- Protobuf Deserialization ---
decodedUserProto := &User{}
startProtoUnmarshal := time.Now()
err = proto.Unmarshal(protoData, decodedUserProto)
if err != nil {
log.Fatalf("Protobuf unmarshal error: %v", err)
}
protoUnmarshalDuration := time.Since(startProtoUnmarshal)
fmt.Printf("Protobuf unmarshal duration: %v\n", protoUnmarshalDuration)
fmt.Printf("Decoded Protobuf User: %+v\n", decodedUserProto)
fmt.Println("\n--- JSON Serialization ---")
// --- JSON Serialization ---
startJsonMarshal := time.Now()
jsonData, err := json.Marshal(user)
if err != nil {
log.Fatalf("JSON marshal error: %v", err)
}
jsonMarshalDuration := time.Since(startJsonMarshal)
fmt.Printf("JSON data size: %d bytes\n", len(jsonData))
fmt.Printf("JSON marshal duration: %v\n", jsonMarshalDuration)
// --- JSON Deserialization ---
decodedUserJson := &User{}
startJsonUnmarshal := time.Now()
err = json.Unmarshal(jsonData, decodedUserJson)
if err != nil {
log.Fatalf("JSON unmarshal error: %v", err)
}
jsonUnmarshalDuration := time.Since(startJsonUnmarshal)
fmt.Printf("JSON unmarshal duration: %v\n", jsonUnmarshalDuration)
fmt.Printf("Decoded JSON User: %+v\n", decodedUserJson)
}
When you run this, you’ll see Protobuf producing significantly smaller payloads and faster serialization/deserialization times, especially as your data grows or becomes more complex. For our small User example, Protobuf might be ~50 bytes while JSON is ~100 bytes. The time difference might be microseconds, but it scales.
Protobuf’s core problem it solves is efficient data serialization. Unlike JSON, which is human-readable text and self-describing (keys are repeated for every record), Protobuf uses a binary format with a predefined schema. This schema acts like a blueprint, allowing for compact encoding. Field numbers (like 1, 2, 3 in the .proto file) are used instead of field names in the serialized data. Integers are encoded using a variable-length scheme called Varint, which uses fewer bytes for smaller numbers.
The system works by defining your data structures in .proto files. These files are then compiled by the protoc compiler into code for your target language. This generated code provides classes or structs with methods for serializing (packing) and deserializing (unpacking) your data to/from the binary Protobuf format. When you send Protobuf data over a network or store it, you’re sending these compact binary blobs. The receiving end uses the same generated code (or a compatible implementation) and the same .proto definition to interpret the data.
The exact levers you control are primarily in the .proto definition. You define your messages, fields, and their types (scalar types like int32, string, bool, or complex types like enum, repeated fields, and nested messages). The syntax = "proto3"; declaration is important as proto2 has subtle differences in default behavior. Field numbers are crucial; they are stable identifiers and must not be reused once a message has been released. Changing a field’s type after release is also problematic. You can mark fields as optional (though in proto3, all scalar fields are effectively optional by default and absent if not set), required (which doesn’t exist in proto3 for good reason – it can break compatibility), or repeated.
One of the most surprising aspects of Protobuf’s efficiency for numerical types is how it handles signed integers. While you might expect a fixed number of bytes per integer (like 4 for int32, 8 for int64), Protobuf uses Varint encoding. For signed integers, it employs a technique called ZigZag encoding. This maps signed integers to unsigned ones in a way that smaller absolute values (both positive and negative) are represented by fewer bytes. For example, 0 is encoded as 0x00, -1 as 0x01, 1 as 0x02, -2 as 0x03, and 2 as 0x04. This means that even if you’re using int64, a value like -5 might only take 1 byte to serialize, whereas in JSON it would be the 2 characters - and 5.
The next hurdle you’ll likely encounter is managing schema evolution.