Protobuf is designed to be forward and backward compatible, meaning you can evolve your schemas over time without breaking existing clients or servers. The key to this magic, especially for forward compatibility (newer code reading older data), is how it handles unknown fields.

Let’s see this in action.

Imagine we have a simple Person message:

syntax = "proto3";

message Person {
  string name = 1;
  int32 age = 2;
}

And we have some old data serialized with this schema:

package main

import (
	"fmt"
	"log"

	"google.golang.org/protobuf/proto"
	pb "your_module/person" // Assuming your proto file is in a package named 'person'
)

func main() {
	oldPerson := &pb.Person{
		Name: "Alice",
		Age:  30,
	}

	data, err := proto.Marshal(oldPerson)
	if err != nil {
		log.Fatalf("Failed to marshal: %v", err)
	}

	fmt.Printf("Serialized old data: %x\n", data)
}

Running this might produce something like:

Serialized old data: 0a05416c696365101e

Now, let’s say we evolve our schema by adding a new field, city, to Person:

syntax = "proto3";

message Person {
  string name = 1;
  int32 age = 2;
  string city = 3; // New field
}

And we have new code that expects this city field. If this new code receives the old data (serialized without city), it won’t crash. Instead, it will simply ignore the missing city field.

package main

import (
	"fmt"
	"log"

	"google.golang.org/protobuf/proto"
	pb "your_module/person"
)

func main() {
	// This is the *old* serialized data from the previous example
	oldData := []byte{0x0a, 0x05, 0x41, 0x6c, 0x69, 0x63, 0x65, 0x10, 0x1e}

	newPerson := &pb.Person{} // Using the *new* schema definition

	err := proto.Unmarshal(oldData, newPerson)
	if err != nil {
		log.Fatalf("Failed to unmarshal: %v", err)
	}

	fmt.Printf("Deserialized new schema with old data: %+v\n", newPerson)
}

The output will be:

Deserialized new schema with old data: &{Name:Alice Age:30 City:}

Notice that City is just its zero value ("" for a string). The unknown field (city) was not present in the old data, so it’s not populated. This is the core of forward compatibility: new code can parse old data because it knows how to handle missing fields gracefully.

The real magic happens when old code tries to read data serialized by new code. Let’s add city to our Person and serialize it with the new schema:

package main

import (
	"fmt"
	"log"

	"google.golang.org/protobuf/proto"
	pb "your_module/person"
)

func main() {
	newPersonWithCity := &pb.Person{
		Name: "Bob",
		Age:  25,
		City: "New York", // New field
	}

	data, err := proto.Marshal(newPersonWithCity)
	if err != nil {
		log.Fatalf("Failed to marshal: %v", err)
	}

	fmt.Printf("Serialized new data: %x\n", data)
}

This might produce:

Serialized new data: 0a03426f6210191a084e657720596f726b

Now, if we try to deserialize this data using the old Person schema (which doesn’t know about city):

package main

import (
	"fmt"
	"log"

	"google.golang.org/protobuf/proto"
	pb_old "your_module/person_old" // Assuming you have a separate proto for the old schema
)

func main() {
	// This is the *new* serialized data containing 'city'
	newData := []byte{0x0a, 0x03, 0x42, 0x6f, 0x62, 0x10, 0x19, 0x1a, 0x08, 0x4e, 0x65, 0x77, 0x20, 0x59, 0x6f, 0x72, 0x6b}

	oldPerson := &pb_old.Person{} // Using the *old* schema definition

	err := proto.Unmarshal(newData, oldPerson)
	if err != nil {
		log.Fatalf("Failed to unmarshal: %v", err)
	}

	fmt.Printf("Deserialized old schema with new data: %+v\n", oldPerson)
}

The output will be:

Deserialized old schema with new data: &{Name:Bob Age:25}

The city field is simply ignored. Protobuf’s wire format includes field numbers. When the old parser encounters a field number it doesn’t recognize (like 3 for city), it reads the tag and the length-delimited value (for strings, bytes, embedded messages) or fixed-width value (for numbers, booleans) and discards it. This is why it’s crucial to never reuse field numbers and to always add new fields with new numbers.

The mental model here is that the wire format is a sequence of key-value pairs, where the key is the field number and wire type, and the value is the encoded data. When you unmarshal, the parser iterates through these pairs. If it knows the field number, it decodes it into the corresponding struct field. If it doesn’t, it skips the tag and the value.

The specific levers you control are the field numbers. Adding a new field with a new, unused field number (e.g., 4, 5, etc.) ensures that older code will simply ignore it. Removing fields is where compatibility can break, as older code might still expect to see data for that field number. Deleting fields should be done with extreme caution, typically by marking them as deprecated and not using their number again.

What most people don’t realize is that Protobuf’s "unknown fields" aren’t just for new fields. They are also how the system handles fields that are present in the data but not defined in the .proto file being used for deserialization. This is the same mechanism that allows new code to read old data when old fields are missing. The runtime library maintains a list of these unrecognized fields, and you can even access them programmatically if you need to, though this is rarely necessary for basic forward compatibility.

The next concept to explore is backward compatibility, where newer code can read data serialized by older code, and how to handle deleted fields safely.

Want structured learning?

Take the full Protobuf course →