Adding and removing fields from a Protobuf schema can break existing clients and servers if not handled with extreme care.

Let’s watch this in action. Imagine we have a User message:

syntax = "proto3";

message User {
  string name = 1;
  int32 age = 2;
}

A client sending this User message to a server:

{
  "name": "Alice",
  "age": 30
}

The server, receiving this, deserializes it into a User object. If the server is written in Python, it might look like this:

import user_pb2

user = user_pb2.User()
user.ParseFromString(serialized_data)

print(f"Name: {user.name}, Age: {user.age}")
# Output: Name: Alice, Age: 30

Now, let’s evolve the schema.

Adding a Field Safely

We want to add an email field. The key to safe evolution is never reusing field numbers.

syntax = "proto3";

message User {
  string name = 1;
  int32 age = 2;
  string email = 3; // New field
}

A new client, compiled against this updated schema, sends:

{
  "name": "Bob",
  "age": 25,
  "email": "bob@example.com"
}

An old server, compiled against the previous schema (without email), receives this. What happens? Protobuf’s wire format is key-value pairs, where the "key" is the field number. When the old server sees field number 3, it doesn’t know what email is. It simply ignores it.

The user.ParseFromString(serialized_data) call on the old server will succeed. The name and age fields will be populated as expected. The email field will be silently dropped. This is the fundamental safety mechanism: unknown fields are ignored.

Removing a Field Safely

Now, let’s say we want to remove the age field.

Consider the original schema again:

syntax = "proto3";

message User {
  string name = 1;
  int32 age = 2;
}

And a new client, compiled against a schema where age is removed (and no field number is reused):

syntax = "proto3";

message User {
  string name = 1;
  string email = 3; // 'age' (field 2) is gone, 'email' now uses field 3
}

This new client sends:

{
  "name": "Charlie",
  "email": "charlie@example.com"
}

An old server, compiled against the schema with age (field 2), receives this. It expects field 1 (name) and field 2 (age). It sees field 1 and successfully deserializes name. It doesn’t see field 2. The age field on the server-side User object will simply remain unset (or default to its default value, which is 0 for int32). There is no error.

The crucial point is that field numbers are stable identifiers. You are mapping a number to a piece of data. Adding a new number is fine because old systems ignore it. Removing a number is fine because new systems just won’t send it, and old systems will simply not find it.

The Danger Zone: Reusing Field Numbers

The problem arises when you reuse a field number.

Let’s say you have the original User message:

syntax = "proto3";

message User {
  string name = 1;
  int32 age = 2;
}

And a client sends: {"name": "Alice", "age": 30}.

Now, you decide to remove age and add gender using the same field number:

syntax = "proto3";

message User {
  string name = 1;
  string gender = 2; // REUSED FIELD NUMBER!
}

A new client, compiled against this, sends: {"name": "Bob", "gender": "male"}.

An old server, compiled against the schema with age = 2, receives this. It sees field 1 and deserializes name. Then it sees field 2. It expects age (an int32). It receives "male" (a string). This will likely cause a deserialization error on the server, as the wire type of the data (string) does not match the expected wire type for field 2 (int32).

Conversely, if an old client (sending age=30) talks to a new server (expecting gender), the server sees field 2 with an integer value and tries to interpret 30 as a gender string, which will also likely fail or produce garbage.

The Safe Way to Remove a Field

To safely remove a field, you should:

  1. Deprecate the field in your documentation and code comments.
  2. Remove it from the schema, ensuring no other field uses that number.
  3. Deploy the schema change to your services.
  4. Deploy clients that no longer send the field.
  5. Only after a significant grace period where all clients have been updated, can you consider reusing the field number for a completely different purpose. However, it’s generally best practice to reserve numbers for truly new concepts.

The Mental Model: Field Numbers as Stable Pointers

Think of field numbers as stable pointers into your message structure. When you add a new pointer (a new field number), old systems don’t know what it points to and ignore it. When you remove a pointer, old systems will look for it but won’t find it, and new systems simply won’t have that pointer defined. The danger comes when you try to make a new pointer point to where an old pointer used to be, if the underlying data type or meaning has changed.

Protobuf’s wire format is a sequence of (field_number, wire_type, value) tuples. When parsing, if a field_number is encountered that the current schema doesn’t know about, the parser simply skips that tuple. This is why adding fields is safe. When removing a field, the sender simply stops emitting that (field_number, ...) tuple. The receiver, if it’s an older version, will look for it and not find it, leading to the field being unset, which is also safe. The problem is always when a new field is given the same number as an old field.

The next common pitfall is dealing with oneof fields and how their evolution interacts with existing services.

Want structured learning?

Take the full Protobuf course →