Protobuf serialization isn’t always faster than JSON; sometimes, it’s dramatically slower, especially for small, highly-structured messages.

Let’s see this in action. We’ll serialize a simple User message with a few fields.

syntax = "proto3";

message User {
  int32 id = 1;
  string username = 2;
  repeated string tags = 3;
}

Here’s a quick Python benchmark.

import timeit
import json
import user_pb2 # Assuming user_pb2.py is generated from user.proto

# Sample data
user_data = {
    "id": 12345,
    "username": "testuser",
    "tags": ["python", "protobuf", "benchmark"]
}

# Create Protobuf message
user_message = user_pb2.User()
user_message.id = user_data["id"]
user_message.username = user_data["username"]
user_message.tags.extend(user_data["tags"])

# JSON serialization
json_string = json.dumps(user_data)

# Protobuf serialization
protobuf_bytes = user_message.SerializeToString()

# Benchmarking
num_iterations = 100000

json_time = timeit.timeit(lambda: json.dumps(user_data), number=num_iterations)
protobuf_time = timeit.timeit(lambda: user_message.SerializeToString(), number=num_iterations)

print(f"JSON Serialization Time: {json_time:.6f} seconds for {num_iterations} iterations")
print(f"Protobuf Serialization Time: {protobuf_time:.6f} seconds for {num_iterations} iterations")

# For demonstration, let's also check deserialization
json_obj = json.loads(json_string)
user_message_deserialized = user_pb2.User()
user_message_deserialized.ParseFromString(protobuf_bytes)

json_deserialize_time = timeit.timeit(lambda: json.loads(json_string), number=num_iterations)
protobuf_deserialize_time = timeit.timeit(lambda: user_pb2.User().ParseFromString(protobuf_bytes), number=num_iterations)

print(f"JSON Deserialization Time: {json_deserialize_time:.6f} seconds for {num_iterations} iterations")
print(f"Protobuf Deserialization Time: {protobuf_deserialize_time:.6f} seconds for {num_iterations} iterations")

When you run this, you’ll likely see JSON serialization and deserialization being faster for this specific, small message. This is counterintuitive because Protobuf is famous for its speed and compact size.

The core problem Protobuf solves is efficient data interchange between different services or languages. It achieves this through a binary, schema-defined format. You define your data structure once in a .proto file, and the Protobuf compiler generates code for various languages to serialize and deserialize your data. This schema ensures that both sender and receiver understand the data’s structure without needing a human-readable intermediate.

Internally, Protobuf uses a technique called "tag-value" encoding. Each field in your message is assigned a unique number (the "tag"). When serializing, the tag number and the field’s value are written to the output. For primitive types like integers and booleans, it uses a compact binary encoding called "varint" which uses fewer bytes for smaller numbers. For strings and byte arrays, it writes the length followed by the data. Repeated fields are encoded by writing the tag and value multiple times.

The reason JSON can be faster for small, simple messages lies in the overhead of Protobuf’s encoding process. Generating the tag-value pairs, determining the wire type (e.g., varint, length-delimited), and performing the binary encoding for each field adds computational cost. JSON, while text-based and often larger, has a simpler, more direct serialization process. For small payloads, the cost of Protobuf’s structured encoding can outweigh the benefits of its binary representation, especially if the JSON encoder is highly optimized. Furthermore, the schema definition and code generation step for Protobuf, while a one-time cost, adds complexity that isn’t present in plain JSON.

What most people miss is that Protobuf’s performance advantage is heavily dependent on message size and complexity, and the specific implementation of the Protobuf library in your language. For very large messages, or messages with deeply nested structures, Protobuf’s binary encoding and schema enforcement will almost always win out in both speed and size. The tag-based encoding means you don’t repeat field names (like "id", "username" in JSON), which saves significant space and parsing time for larger datasets. The varint encoding for numbers is also a huge win for numerical data.

The next hurdle you’ll face is understanding how to manage schema evolution and compatibility when your Protobuf definitions change over time.

Want structured learning?

Take the full Protobuf course →