Protobuf’s "repeated" fields are actually dynamic-sized arrays, not fixed-size arrays, and they behave much more like lists than C-style arrays.
Let’s see this in action.
First, define a simple message with a repeated field:
// person.proto
syntax = "proto3";
message Person {
string name = 1;
repeated int32 favorite_numbers = 2;
}
Now, imagine we have a Python script that uses this definition:
from person_pb2 import Person
# Create a new Person message
person = Person()
person.name = "Alice"
# Add elements to the repeated field
person.favorite_numbers.append(10)
person.favorite_numbers.append(20)
person.favorite_numbers.append(30)
# You can also extend it with multiple values
person.favorite_numbers.extend([40, 50])
# Access elements by index (like a list)
print(f"First favorite number: {person.favorite_numbers[0]}")
# Check the size
print(f"Number of favorite numbers: {len(person.favorite_numbers)}")
# Iterate over the elements
print("All favorite numbers:")
for number in person.favorite_numbers:
print(number)
# Serialize and deserialize to see it work with bytes
serialized_data = person.SerializeToString()
print(f"Serialized data: {serialized_data}")
new_person = Person()
new_person.ParseFromString(serialized_data)
print(f"Deserialized name: {new_person.name}")
print(f"Deserialized favorite numbers: {list(new_person.favorite_numbers)}")
When you run this, you’ll see output like:
First favorite number: 10
Number of favorite numbers: 5
All favorite numbers:
10
20
30
40
50
Serialized data: b'\n\x05Alice\x10\n\x14\x18\x1e(\x28'
Deserialized name: Alice
Deserialized favorite numbers: [10, 20, 30, 40, 50]
This demonstrates that favorite_numbers acts like a Python list: you can append, extend, access by index, and iterate.
The core problem Protobuf’s repeated fields solve is handling collections of data within a structured message. Before repeated fields, you’d often end up with messages like tags_count = 1, tag1 = 2, tag2 = 3, which is cumbersome and doesn’t scale. repeated fields provide a clean, schema-defined way to represent lists, allowing you to have an arbitrary number of values for a given field. Internally, Protobuf serializes these by using the field’s tag multiple times in the encoded byte stream. When deserializing, it collects all values associated with that tag into a list-like structure for your programming language.
The actual levers you control are primarily within your .proto file definition and how you interact with the generated code in your application. You decide which fields are collections (repeated) and what their element type is. In your application code, you use the methods provided by the generated Protobuf library (like append, extend, slicing, iteration) to manage the data. The serialization/deserialization process abstracts away the complexity of how these repeated fields are packed into the binary format.
What’s often missed is how Protobuf handles the absence of a repeated field versus an empty repeated field during serialization. If you create a message and never add any elements to a repeated field, it will not be present in the serialized output at all. This is different from scalar fields where a default value (like 0 for int32, or an empty string) might be included. Only when you add at least one element to a repeated field does it get encoded. This can be a subtle point when dealing with diffing messages or understanding why a repeated field might appear "missing" in certain contexts if it was never populated.
The next concept you’ll likely encounter is how to handle oneof fields for choosing between different types of data within a message.