Proto3 is a streamlined version of proto2, but its simplicity comes at the cost of features that can be crucial for maintaining backward compatibility and handling optional fields.

Let’s see how proto3 handles a simple message:

syntax = "proto3";

message Person {
  string name = 1;
  int32 id = 2;
  repeated string emails = 3;
}

When you compile this with protoc, you get code that serializes Person objects. The key difference emerges when you consider how default values and field presence are managed.

In proto2, fields are implicitly optional. This means if you don’t set a field, it’s not present in the serialized data. You can explicitly mark fields as required or optional.

syntax = "proto2";

message Person {
  optional string name = 1;
  required int32 id = 2;
  repeated string emails = 3;
}

The required keyword in proto2 is a strong guarantee: a message is considered invalid if a required field is missing. This helps catch errors early. The optional keyword, on the other hand, signifies that the field may or may not be present.

Proto3, in its quest for simplicity, removes required and optional keywords. All fields in proto3 are effectively optional. This is where things get tricky.

Consider a proto3 message with a default value:

syntax = "proto3";

message Product {
  string name = 1;
  double price = 2 [default = 19.99];
  bool is_available = 3;
}

If you serialize a Product where price is not explicitly set, the serializer will not include the price field in the output. Instead, when deserialized, the price field will take on its default value of 19.99. This is a significant departure from proto2, where an unset optional field would simply not be present.

This behavior in proto3 can lead to subtle bugs when dealing with zero values. For instance, if is_available is false (the default for boolean in proto3), and you don’t explicitly set it, the field won’t be serialized. When deserializing, it will correctly be interpreted as false. However, if you had a proto2 equivalent with optional bool is_available = 3;, an unset field would mean the receiver doesn’t know if it was explicitly set to false or if it was simply never set. This distinction matters for backward compatibility.

The absence of explicit field presence in proto3 means you cannot reliably distinguish between a field that was explicitly set to its default value and a field that was never set. This is a major hurdle when migrating from proto2 to proto3 or when designing APIs that need to evolve gracefully.

The default option in proto3 only affects how the default value is applied during deserialization when the field is missing from the wire. It does not cause the field to be serialized if it holds its default value. This is a critical point of confusion.

If you need to know if a field was actually set by the sender, even if it’s set to its default value, proto3 doesn’t provide a direct mechanism. You’d have to resort to workarounds, like using a wrapper type (e.g., google.protobuf.Int32Value) or adding a separate boolean flag to indicate presence, which defeats the purpose of proto3’s simplicity.

The primary reason proto3 was designed this way was to simplify the wire format and the generated code. By not serializing default values, proto3 can achieve smaller message sizes. However, this comes at the expense of the precise control over field presence that proto2 offered, making it harder to manage evolving schemas and ensure consistent behavior across different versions of your services.

Proto3’s oneof feature provides a way to ensure that only one of a set of fields can be set at a time, which is a useful construct for mutually exclusive options, but it doesn’t solve the fundamental problem of distinguishing between unset fields and fields set to their default values.

The most surprising true thing about proto3 is that a field set to its default value (like price = 19.99 in the Product example) is not serialized. This means the wire format will be identical whether price was explicitly set to 19.99 or never touched at all.

This lack of explicit field presence in proto3, compared to proto2’s optional and required keywords, is the core difference that impacts backward compatibility and the ability to distinguish between unset values and default values.

The next concept you’ll likely encounter is how to handle schema evolution safely with Protobuf, especially when migrating between proto2 and proto3 or when dealing with services that cannot tolerate breaking changes.

Want structured learning?

Take the full Protobuf course →