Prometheus’s multi-tenancy story is more about how you break it than how you build it.
Thanos and Cortex are the two heavyweights you’ll see when people talk about scaling Prometheus beyond a single instance, especially when you need to isolate data and access between different teams or customers. They both aim to solve the same core problem: how to get Prometheus to store and query data from multiple, independent Prometheus servers, and how to do it in a way that keeps Tenant A’s metrics from bleeding into Tenant B’s.
Let’s look at Thanos first. Imagine you have a bunch of Prometheus instances scattered across your infrastructure. Thanos acts as a global query layer on top of these. It uses sidecars that run alongside each Prometheus instance. These sidecars are the key: they continuously upload Prometheus’s TSDB (Time Series Database) blocks to object storage (like S3, GCS, or Azure Blob Storage).
Here’s a glimpse of how a Thanos setup might look in practice. You’d typically have Prometheus servers, each with a Thanos sidecar.
# prometheus.yml
scrape_configs:
- job_name: 'my_app'
static_configs:
- targets: ['app-instance-1:9090', 'app-instance-2:9090']
# Sidecar configuration (simplified)
# This would be in the sidecar's config, not Prometheus directly
sidecar:
bucket: 's3://my-thanos-bucket/data'
objstore.config-file: '/etc/thanos/objstore.yml'
sync: true
The Thanos query component then pulls data from these uploaded blocks in object storage and also queries the live Prometheus instances via their gRPC APIs. The store component is what actually exposes the data from object storage to the query component.
# Example Thanos query command
thanos query \
--store s3://my-thanos-bucket/data \
--query.replica-label=prometheus_replica \
--web.listen-address=":10902"
Thanos handles multi-tenancy by using tenant labels. When data is uploaded to object storage, the sidecar can be configured to add tenant labels to the data. Then, when you query through the Thanos query component, you can filter based on these labels. This means you need to ensure your Prometheus instances are configured to send tenant information, often through relabeling rules or by setting specific labels on the Prometheus server itself.
Now, let’s pivot to Cortex. Cortex takes a more centralized approach. Instead of sidecars uploading to object storage, Cortex is designed to be a horizontally scalable, multi-tenant Prometheus backend. Your Prometheus instances, instead of writing to their local TSDB for long-term storage, are configured to remote write their metrics directly to Cortex.
Here’s a snippet of a Prometheus config pointing to a Cortex instance:
# prometheus.yml
scrape_configs:
- job_name: 'my_service'
static_configs:
- targets: ['service-a:9090']
remote_write:
- url: "http://cortex-ingester:9201/api/v1/push"
queue_config:
max_samples_per_send: 1000
Cortex itself is composed of several microservices: ingester (receives data), distributor (routes data), querier (handles queries), compactor (optimizes data), and ruler (evaluates alerting and recording rules).
# Example Cortex querier command
./cortex-binary \
-target=querier \
-config.file=/etc/cortex/config.yaml \
-grpc.listen-address=":9095"
Cortex natively supports multi-tenancy. When you send data via remote_write, you include a X-Prometheus-Remote-Write-Version: 01 header, and crucially, a X-Scope-OrgID header. This X-Scope-OrgID is the tenant ID.
# Example curl to Cortex ingester, showing tenant ID
curl -H "X-Scope-OrgID: tenant-abc" \
-H "Content-Type: application/x-protobuf" \
--data-binary @metrics.pb \
http://cortex-ingester:9201/api/v1/push
The distributor component in Cortex uses this X-Scope-OrgID to segregate data into different "users" or tenants within its storage backend (which can be object storage, Cassandra, etc.). When you query Cortex, you also specify the X-Scope-OrgID header to retrieve data only for that tenant.
The most surprising true thing about both Thanos and Cortex is that they don’t actually store metrics themselves in the traditional sense for long-term retention. They orchestrate the storage and retrieval of data generated by Prometheus, offloading the heavy lifting to external systems like object storage or dedicated databases.
The real magic happens in how they manage the data lifecycle. For Thanos, the compact component periodically consolidates smaller TSDB blocks into larger ones in object storage, reducing query overhead and storage costs. For Cortex, the compactor service does a similar job, merging data chunks to improve read performance and reduce the number of files to manage.
The one thing most people don’t know is how the query federation works. In Thanos, the query component talks to multiple store gateways (which can represent individual Prometheus instances or object storage). It intelligently requests data only from the stores that contain the relevant time ranges and metric labels, often using Thanos’s internal indexing. This means a query like sum(up) on a Thanos cluster with 100 Prometheus instances doesn’t hit all 100 instances simultaneously; it figures out which ones might have the data and queries them.
The next step in scaling involves understanding how to manage alerting and recording rules across these multi-tenant setups, which often involves setting up dedicated Thanos rule components or leveraging Cortex’s ruler service.