Proto3 syntax in Protobuf is a surprisingly flexible and powerful way to define data structures, but its most counterintuitive aspect is how it handles default values.
Let’s see it in action. Imagine we have a simple Person message with a name, an ID, and an email.
syntax = "proto3";
message Person {
string name = 1;
int32 id = 2;
string email = 3;
}
When you serialize an instance of this message, Protobuf doesn’t store default values. If you create a Person object and leave name as an empty string, id as 0, or email as an empty string, those fields simply won’t be present in the serialized output. This makes your data more compact, but it also means you can’t distinguish between a field that was explicitly set to its default value and a field that was never set at all.
This behavior is a core design choice in proto3, aimed at reducing message size. Unlike proto2, proto3 doesn’t have explicit required fields or a concept of "unset" versus "default." For primitive types like integers, booleans, and floats, the default is their zero-value (0, false, 0.0). For strings and bytes, it’s an empty string or empty byte sequence. For enums, it’s the first defined enum value (which must be 0). For messages, it’s null or an empty message.
When you receive a proto3 message, the deserializer will populate your object with these default values for any fields that were not present in the serialized data. So, if you receive a Person message where the id field was omitted, your program will see id as 0. This is usually convenient, but it requires careful handling if you need to know if a field was actually set.
To handle this, proto3 introduced the FieldPresence feature, which is enabled by default in newer Protobuf versions. For scalar fields, this means you can use the optional keyword.
syntax = "proto3";
message Person {
optional string name = 1;
optional int32 id = 2;
optional string email = 3;
}
With optional, the field’s presence can be explicitly checked. If id is optional, and it’s not set in the serialized message, it will be absent in your deserialized object. Your programming language’s Protobuf library will typically provide a way to check for this absence, often through methods like has_id() or by returning a special "unset" value. This allows you to differentiate between a field that was never set and one that was explicitly set to its default value.
The key difference between proto2’s optional and proto3’s optional is how they handle default values. In proto2, optional fields could have a specific default value defined. In proto3, optional merely indicates presence or absence. If you need to associate a specific default value with a field in proto3, you often have to manage that in your application code.
Consider the zero value concept again. If you have a repeated field, like a list of tags, and it’s empty, it’s simply not serialized. When deserialized, you get an empty list. You don’t need to check for "presence" of the repeated field itself; its presence is implied by its existence in the message definition. The "optional" keyword applies to scalar fields to track their individual presence.
The most surprising thing about proto3’s default value handling is that the zero-value for an enum field is always the first defined enum value, and this value must be 0. If you don’t explicitly define a value of 0 in your enum, Protobuf will automatically assign it to the first declared enum member. This can lead to unexpected behavior if you’re not mindful of the enum’s ordering. For example, if you have enum Status { RUNNING = 1; STOPPED = 2; }, the zero value will effectively be RUNNING because it’s the first value, even though it’s not explicitly 0. To ensure proper zero-value semantics, always declare VALUE_UNSPECIFIED = 0; as your first enum member.
The next concept you’ll encounter is how to manage complex types and oneofs for more advanced data modeling.