buf’s linting and breaking change detection is actually a form of static analysis that predicts runtime failures by inspecting your Protocol Buffer schema evolution.
Here’s buf in action, detecting a breaking change. Imagine we have a proto file:
// file: v1/user.proto
syntax = "proto3";
package myapp.v1;
message User {
string name = 1;
int32 age = 2;
}
This defines a simple User message. Now, let’s say we want to add a new field. In v2, we might try this:
// file: v2/user.proto
syntax = "proto3";
package myapp.v2;
message User {
string name = 1;
int32 age = 2;
string email = 3; // Added email field
}
If we run buf breaking --against 'git-commit:abc12345' (where abc12345 is the commit hash of v1/user.proto), buf will immediately flag this as a breaking change:
breaking:
User.email
field number 3 is not present in the previous version
It’s not just about adding fields; it’s about how Protocol Buffers handles field numbers and serialization. When a field is added with a new number, older clients that don’t know about this new field will simply ignore it during deserialization. This is generally fine. However, if you were to remove a field or change its type in a way that a previous value can’t be correctly interpreted, that’s where the trouble starts.
Let’s look at a scenario where buf saves you:
Scenario 1: Removing a Field
Suppose in v2, we decide to remove the age field:
// file: v2/user.proto
syntax = "proto3";
package myapp.v2;
message User {
string name = 1;
// int32 age = 2; // Removed age field
string email = 3;
}
Running buf breaking --against 'git-commit:abc12345' would yield:
breaking:
User.age
field number 2 is required
This is a breaking change because any client still using v1 of the schema and sending a User message with an age would have its age field ignored by a v2 server that no longer expects it, or worse, if the server was expecting it and relied on its presence, it could lead to errors. buf correctly identifies this as a problem because removing fields with assigned numbers is a common source of runtime errors.
Scenario 2: Changing a Field Type
Consider changing age from int32 to string in v2:
// file: v2/user.proto
syntax = "proto3";
package myapp.v2;
message User {
string name = 1;
string age = 2; // Changed type to string
string email = 3;
}
buf breaking --against 'git-commit:abc12345' would report:
breaking:
User.age
type 'int32' changed to 'string'
This is problematic because a v1 client sending an integer for age (e.g., age: 30) would likely be deserialized by a v2 client as an empty string or an unexpected value, depending on the protobuf implementation. The reverse is also true: a v2 client sending a string like age: "thirty" would be unparseable by a v1 client expecting an integer. buf flags this because incompatible type changes prevent backward compatibility.
Scenario 3: Reusing a Field Number
What if we delete age and then add address using the same field number?
// file: v2/user.proto
syntax = "proto3";
package myapp.v2;
message User {
string name = 1;
// int32 age = 2; // Removed age field
string address = 2; // Reused field number 2
string email = 3;
}
buf breaking --against 'git-commit:abc12345' would catch this:
breaking:
User.address
field number 2 is not present in the previous version
User.age
field number 2 is required
This is a critical breaking change. A v1 client sending a User message with age = 30 would have that data serialized with field number 2. A v2 client receiving this message would deserialize the 30 into its address field, not age. This leads to data corruption and incorrect application logic. buf’s check for reused field numbers is crucial for preventing this.
The Core Mechanism: Field Numbers and Wire Format
Protocol Buffers don’t serialize field names; they serialize field numbers and their associated values. When a message is serialized, each field is represented by its field number and its wire type (e.g., VARINT, LENGTH_DELIMITED).
- Adding a field: A new field number is introduced. Older clients, not knowing this number, simply skip over it during deserialization. This is safe.
- Removing a field: If you remove a field that an older client might send, that data will be associated with the field number. If that field number is later reused, or if the receiving service expects that field and it’s now gone, you have a problem.
bufflags removal because it can lead to data loss or confusion if the number is reused or if the service logic implicitly depends on the field’s presence. - Changing field type: If you change a type from, say,
int32tostring, the wire format for the serialized data will be different. Av1client sending an integer will produce aVARINTfor field2. Av2client expecting astringwill try to interpret thatVARINTas a string, which will fail or produce garbage.bufdetects this because the wire format interpretation will break. - Reusing a field number: This is the most dangerous. If
v1sendsage = 30(field2), andv2hasaddressas field2, the30will be deserialized intoaddress, notage.buf’s linting ruleFIELD_NUMBER_MUST_BE_IN_PREVIOUS_VERSIONspecifically catches when a new field is added that was not in the previous version, andFIELD_NUMBER_MUST_BE_IN_NEXT_VERSIONcatches when a field is removed.
buf’s power comes from understanding these fundamental serialization rules and enforcing them through static analysis before code even runs. It compares your current proto files against a reference (like a Git commit or another branch) and applies a set of rules to identify potential incompatibilities.
The next thing you’ll likely grapple with is how to manage multiple proto files and their dependencies effectively, which is where buf’s module system and dependency management come into play.