Protobuf schemas are immutable by design, but in a distributed system, you will need to evolve them.

Here’s a User schema:

syntax = "proto3";

message User {
  string name = 1;
  int32 age = 2;
}

When you produce a message with this schema, Confluent Schema Registry assigns it an ID, say 10. If you have a consumer listening for messages with schema ID 10, it expects exactly two fields: name (string) and age (int32).

Now, let’s say you need to add an email field:

syntax = "proto3";

message User {
  string name = 1;
  int32 age = 2;
  string email = 3; // New field
}

If you register this new schema, Schema Registry will assign it a new ID, say 11. A producer using the new schema will send messages with ID 11. A consumer still expecting ID 10 will receive a message with ID 11, try to deserialize it using schema 10, and fail.

This is where versioning comes in. Schema Registry manages schema evolution by allowing you to register new versions of a schema. When you register the second User schema, you associate it with the same logical schema name (User) as the first one. Schema Registry then knows these are related.

The magic happens in the compatibility settings. By default, Schema Registry enforces backward compatibility. This means a new version of a schema must be readable by older consumers. For our User example, adding email as a new field with a unique tag (3) maintains backward compatibility. An older consumer expecting schema 10 will receive a message with schema 11, but since email is optional (new fields in Protobuf are optional by default), it will simply ignore it and correctly deserialize name and age.

However, if you were to remove the age field, this would break backward compatibility. An older consumer would receive a message missing the age field it expects and likely error out.

To manage this, you can configure compatibility modes on a per-schema basis. The most common modes are:

  • BACKWARD: New messages can be read by old consumers. Achieved by adding optional fields or fields with default values.
  • FORWARD: Old messages can be read by new consumers. Achieved by adding optional fields or fields with default values.
  • FULL: Both BACKWARD and FORWARD are satisfied.
  • NONE: No compatibility checks are performed. Risky.

Let’s say you want to add an is_active boolean field. You register the new schema version. If you want to ensure that old consumers (still using the previous schema version) can still process messages produced with the new schema, you’d rely on backward compatibility.

Here’s how you’d register a new version of the User schema using curl and assuming you have a Schema Registry running on http://localhost:8081:

First, get the subject name. For Protobuf, it’s typically [topic_name]-[field_name]. Let’s assume your topic is user-events and your Protobuf message is User. The subject would be user-events-User-value.

# Register the second version of the User schema
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  --data '{"schema": "{\"namespace\": \"com.example.kafka\", \"type\": \"record\", \"name\": \"User\", \"fields\": [{\"name\": \"name\", \"type\": \"string\"}, {\"name\": \"age\", \"type\": \"int\"}, {\"name\": \"email\", \"type\": \"string\"}]}"}' \
  http://localhost:8081/subjects/user-events-User-value/versions

The response will give you the new schema ID and version.

The crucial part for Protobuf is that new fields must be added with new, unique field tags. If you try to reuse a tag (e.g., make email tag 2), Protobuf’s wire format would get confused, and Schema Registry’s compatibility checks might not catch it correctly because it’s a Protobuf-specific encoding issue.

When a producer serializes a message, it includes the schema ID. The Kafka consumer library, configured with Schema Registry, fetches the schema associated with that ID to deserialize the message. If the producer uses schema version 2 and the consumer is expecting schema version 1 (and compatibility is set to BACKWARD), the consumer successfully deserializes the message, ignoring the email field.

The most surprising thing about Protobuf evolution is how closely it ties to the wire format. Unlike JSON or Avro where field names are explicit in the serialized data, Protobuf uses field tags. This means adding a field with a new tag is cheap and doesn’t affect older readers. Removing a field, however, is problematic if you want to maintain backward compatibility.

If you change a field’s type (e.g., int32 age to string age), this will likely break backward compatibility unless the old consumers are very lenient or you use specific Protobuf features. Schema Registry will flag this if BACKWARD compatibility is enabled.

The next concept to grasp is how to handle breaking changes, like changing a field type or removing a field, while minimizing disruption.

Want structured learning?

Take the full Protobuf course →