Federation in Prometheus isn’t about aggregating metrics across clusters in the way you might think – it’s about a top-level Prometheus server scraping other Prometheus servers to collect their metrics.
Let’s see it in action. Imagine you have two clusters, cluster-a and cluster-b, each with its own Prometheus server (prom-a and prom-b) scraping local targets. You want a central prom-central to see metrics from both.
Here’s a snippet from prom-central’s prometheus.yml:
scrape_configs:
- job_name: 'cluster-a-federate'
scrape_interval: 30s
metrics_path: '/federate'
params:
'match[]':
- '{job="node_exporter", cluster="a"}'
- '{job="kube-state-metrics", cluster="a"}'
static_configs:
- targets:
- 'prom-a.internal:9090'
- job_name: 'cluster-b-federate'
scrape_interval: 30s
metrics_path: '/federate'
params:
'match[]':
- '{job="node_exporter", cluster="b"}'
- '{job="kube-state-metrics", cluster="b"}'
static_configs:
- targets:
- 'prom-b.internal:9090'
When prom-central scrapes prom-a.internal:9090 with metrics_path: '/federate', it’s not scraping the raw targets prom-a is scraping. Instead, it’s asking prom-a to run a query for the specified metrics and return only those results. The match[] parameter is crucial here; it’s a Prometheus query language (PromQL) selector that prom-a executes locally.
So, prom-central ends up with metrics like node_exporter_build_info{cluster="a", job="node_exporter", ...} and kube_pod_info{cluster="a", job="kube-state-metrics", ...}. The key is that the cluster="a" label is already present on the metrics collected by prom-a, or it’s added by prom-a’s configuration (e.g., via external_labels in prom-a.yml).
This setup solves a few problems:
- Centralized Overview: You get a single pane of glass for critical metrics across multiple Prometheus instances, simplifying monitoring and alerting.
- Reduced Target Load: The scraping load is distributed. Each cluster’s Prometheus scrapes its local targets, and the central Prometheus only scrapes other Prometheus servers, which are generally much lighter targets.
- Data Consolidation: While not a long-term storage solution for all metrics, it provides a consolidated view for high-level dashboards and alerts without requiring massive data aggregation.
Internally, the /federate endpoint on the scraped Prometheus server (prom-a or prom-b) acts like a mini-Prometheus query engine. It receives the match[] parameters, executes those PromQL selectors against its own time series database, and returns the resulting time series in Prometheus’s exposition format. The scrape_interval on the federating job determines how often prom-central polls prom-a and prom-b for these federated metrics.
The external_labels setting on the federated Prometheus servers (e.g., prom-a.yml) is where the cluster identity is cemented. Without it, prom-central wouldn’t know which cluster a metric originated from unless the local Prometheus already added that label to all its scraped targets. For instance, prom-a.yml might have:
global:
external_labels:
cluster: 'a'
region: 'us-east-1'
This ensures that any metric scraped by prom-a will automatically have cluster: 'a' appended, making the federated data in prom-central correctly attributed.
A common pitfall is thinking federation replaces remote_write for long-term storage or deep analysis. Federation is primarily for aggregation and consolidation of specific, selected metrics from multiple Prometheus servers. It doesn’t store historical data from the federated servers; it only pulls the current state or recent data as defined by the query. If prom-a goes down, prom-central will stop receiving federated metrics from cluster-a until prom-a is back up.
The next logical step after federating is to consider how to handle alerting based on this aggregated data, which often involves setting up alerting rules on the central Prometheus instance.