Protobuf schema evolution is a fundamental challenge in microservice architectures, and automating its management is key to preventing runtime errors.
Let’s see this in action. Imagine a typical CI pipeline step for Protobuf.
jobs:
build_and_lint_protos:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: '1.20'
- name: Install protoc and plugins
run: |
sudo apt-get update
sudo apt-get install -y protobuf-compiler
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
- name: Generate Go code from proto files
run: |
protoc --proto_path=. \
--go_out=. \
--go_opt=paths=source_relative \
--go-grpc_out=. \
--go-grpc_opt=paths=source_relative \
./proto/*.proto
- name: Lint proto files
run: |
# Example using buf. Install buf if not already present
# See https://docs.buf.build/installation
buf check lint --path proto/
- name: Run Go tests
run: go test ./...
This pipeline automates two critical tasks: generating Go code from your .proto files and linting those files to enforce style and detect potential issues.
The core problem Protobuf CI addresses is schema drift. When services communicate, they agree on a data structure (the schema). If one service’s implementation of that schema diverges from another’s, or if the schema itself is updated inconsistently, you get runtime errors. These aren’t compile-time errors; they’re messages that get misinterpreted or dropped entirely, leading to silent data corruption or service unavailability. Automating the build and linting process ensures that everyone is working with a consistent, validated view of the schema.
Here’s how it works internally:
-
protoc(Protocol Buffer Compiler): This is the heart of the build process. You pointprotocto your.protofiles (often via a--proto_path) and specify the desired output language and plugins.--go_out=.: Tellsprotocto generate Go code. The.means output to the current directory.--go_opt=paths=source_relative: A crucial option. It instructsprotocto place the generated Go files in the same directory structure as their corresponding.protofiles, making it easy to manage them within your Go project.--go-grpc_out=.and--go-grpc_opt=paths=source_relative: Similar to the above, but specifically for generating gRPC service code (server and client interfaces) alongside the message types.
-
Plugins (
protoc-gen-go,protoc-gen-go-grpc): These are external tools thatprotocinvokes to perform the actual code generation for Go. They translate the schema definitions into Go structs, enums, and methods. -
Linting (
buf check lint): Whileprotoccompiles and generates code, it doesn’t enforce coding style or best practices within the.protofiles themselves. Tools likebuf(a popular Protobuf development tool) fill this gap.buf check lint --path proto/: This command analyzes the.protofiles in theproto/directory against a configurable set of linting rules (defined in abuf.lint.yamlfile). These rules can catch things like unused fields, inconsistent naming conventions, or deprecated features.
The exact levers you control are primarily:
--proto_path: Whereprotoclooks for your.protofiles and imported schemas.- Output directories and options (
--go_out,--go_opt): How and where the generated code is placed. - Plugin versions: Ensuring you’re using compatible versions of
protoc-gen-goandprotoc-gen-go-grpc. - Linting rules (
buf.lint.yaml): Defining the specific quality standards for your.protofiles. - Proto file structure: How you organize your
.protofiles influences import paths and the generated code structure.
A common pitfall is forgetting to commit the generated Go code. The CI pipeline generates it, but if it’s not committed, the next run that needs it will fail because the source .proto files won’t match the expected generated output. protoc itself doesn’t change your repository; it only writes to the filesystem. Your CI should therefore either: 1. generate and then commit the generated code, or 2. always generate it before running tests that depend on it, ensuring consistency. The example above assumes the latter.
The next logical step after automating schema builds and linting is to manage schema breaking changes proactively.