OpenTelemetry’s semantic conventions for HTTP and DB spans are the bedrock of making your distributed traces actually mean something beyond just a visual graph. Without them, you’re looking at a bunch of generic "HTTP Request" and "Database Query" boxes, which is about as useful as a map with no labels. The real magic is when these conventions let you ask, "Which specific database queries are slowing down my most frequent API calls?" or "What percentage of my HTTP requests are returning a 5xx error?"
Let’s see this in action. Imagine a simple web service that fetches user data from a database. Here’s what a trace might look like without conventions:
[Trace 123]
[Span A: HTTP Request /users/{id}]
[Span B: DB Query SELECT * FROM users WHERE id = ?]
Now, with semantic conventions, the same trace becomes incredibly rich:
[Trace 123]
[Span A: HTTP Request (GET /users/{id})]
Attributes:
http.method: GET
http.target: /users/123
http.status_code: 200
net.peer.ip: 192.168.1.100
user_agent.original: MyClient/1.0
server.address: api.example.com
server.port: 8080
[Span B: DB Query (SELECT)]
Attributes:
db.system: postgresql
db.name: users_db
db.user: app_user
db.statement: SELECT * FROM users WHERE id = $1
db.operation: SELECT
db.instance: postgresql_instance_1
net.peer.ip: 172.17.0.5
net.peer.port: 5432
See the difference? The second trace isn’t just a hierarchy; it’s a treasure trove of actionable data.
The problem these conventions solve is the "Tower of Babel" in observability. Every service, written in a different language, using a different framework, would instrument its traces in its own unique way. You’d get custom attribute names like http_verb, db_call, status_code, and user_id. When you tried to aggregate data across services, you’d end up with a mess. You couldn’t easily filter for all GET requests across your entire system, or find all database calls to a specific users_db.
OpenTelemetry’s semantic conventions provide a standardized vocabulary. For HTTP, they define attributes like http.method, http.target, http.status_code, url.scheme, and user_agent.original. For databases, you have db.system, db.name, db.user, db.statement, and db.operation. These aren’t just suggestions; they are agreed-upon standards that instrumentation libraries and backends understand.
When you instrument your code, you’re essentially tagging your spans with these standard attributes. A typical HTTP server instrumentation might automatically set http.method, http.target, and http.status_code. For database calls, you might need to add db.statement and db.operation manually or via a more advanced database instrumentation library. The key is to ensure that the attributes you emit match the convention precisely.
For example, if your database instrumentation captures the SQL query, you’d map it to db.statement. If it can distinguish between SELECT, INSERT, UPDATE, and DELETE, you’d use db.operation. The db.system attribute is crucial for knowing if you’re dealing with postgresql, mysql, redis, mongodb, etc.
This standardization unlocks powerful querying and analysis. In a tracing backend like Jaeger or Datadog, you can now ask questions like:
- "Show me all traces where
http.methodisPOSTandhttp.status_codeis500." - "Find the average duration of spans where
db.systemispostgresqlanddb.operationisSELECT." - "Group traces by
http.targetand show me the top 10 slowest endpoints." - "Filter for requests that involved a database call to
payments_db."
The power comes from the aggregation. A single service might have its own internal metrics, but when you standardize across all services, you gain system-wide visibility. This is the core of distributed tracing: understanding the flow of requests and dependencies across multiple components.
One crucial detail many people overlook is the distinction between http.target and url.path. http.target is the full path and query string as received by the server (e.g., /users/123?sort=asc), while url.path is just the path component (e.g., /users/123). For routing and identifying specific endpoints, http.target is often more useful, especially when dealing with dynamic routes. However, for aggregating metrics based on the structure of the path, url.path (often with placeholders removed) is preferred. Many HTTP frameworks will automatically provide both, but understanding their purpose helps in choosing the right one for your analysis.
The next step after mastering HTTP and DB spans is to explore conventions for other common technologies, like messaging queues and RPC frameworks, to build a truly comprehensive observability picture.