Prometheus cardinality explosion is when your metrics become so numerous and unique that they overwhelm the TSDB’s ability to store and query them efficiently.
Let’s see this in action. Imagine you’re collecting metrics from a fleet of Kubernetes pods, and you’re adding the pod_name and container_name as labels.
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
target_label: app
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
- source_labels: [__meta_kubernetes_pod_container_name]
target_label: container
If you have 1000 pods, each with 5 containers, and each container is running multiple instances of the same application, you can quickly end up with hundreds of thousands or even millions of unique time series. Each unique combination of job, app, pod, and container creates a new time series.
The problem isn’t just the sheer number of series, but the uniqueness of the label combinations. Prometheus stores each unique series in its Time Series Database (TSDB). When cardinality gets too high, the TSDB struggles:
- Memory Usage: The TSDB needs to keep an index of all series in memory. A massive index consumes vast amounts of RAM, leading to OOM kills or extreme slowness.
- Write Performance: Every new unique series requires an update to the TSDB’s index, slowing down ingestion.
- Query Performance: Queries need to scan and filter through this enormous index, making even simple queries take minutes or crash the Prometheus server.
- Disk I/O: As the TSDB grows, disk operations for reads and writes become a bottleneck.
Detecting Cardinality Issues
The first step is to identify which metrics are causing the problem. Prometheus itself provides tools for this.
-
Prometheus UI Metrics:
- Navigate to your Prometheus UI, usually
http://<prometheus-host>:9090/. - Go to
Status->TSDB status. - Look for the "Head Series" count. If this is in the millions and growing rapidly, you have a problem.
- Crucially, check the "Max Series per Label Name" and "Max Series per Label Value" warnings.
- Navigate to your Prometheus UI, usually
-
promtoolfor Cardinality Analysis:- The
promtoolutility (included with Prometheus) is invaluable. - To analyze the current active series and their label counts, run:
promtool tsdb analyze /path/to/prometheus/data - This will output a breakdown of series counts per label name and per label value. Look for labels with an extremely high number of unique values.
- The
-
Querying Prometheus for High-Cardinality Metrics:
- You can write PromQL queries to identify metrics with many unique label combinations. This is often done by counting distinct label values.
- To find metrics with a high number of unique
podlabels:
This query counts how many uniquecount by (job) (count by (pod)({__name__=~".+"}))podlabels exist for eachjob. A very large number here indicates a problem. - To find the top 10 metrics by series count:
This shows you which metric names are generating the most series.topk(10, count by (__name__)({__name__=~".+"}))
Preventing and Mitigating Cardinality Explosions
The core principle is to reduce the number of unique label combinations generated.
-
Relabeling
__metaLabels:- Problem: Automatically scraping Kubernetes metadata like
pod_name,container_name,namespace,node_name, etc., directly into your metrics as labels creates high cardinality if you have many pods, containers, or nodes. - Diagnosis: Use
promtool tsdb analyzeor the UI metrics to see ifpod,container,namespace, ornodeare among the top labels by value count. - Fix: Filter out or replace these dynamic labels during the scrape using
relabel_configs.scrape_configs: - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: # Keep only specific labels, or drop dynamic ones - source_labels: [__meta_kubernetes_pod_label_app] target_label: app # Drop pod and container names if they cause high cardinality - source_labels: [__meta_kubernetes_pod_name] action: drop - source_labels: [__meta_kubernetes_pod_container_name] action: drop # Or, if you need some identifier, map it to a static label if possible # - source_labels: [__meta_kubernetes_pod_name] # regex: 'my-specific-app-(.*)' # target_label: instance_id # action: replace - Why it works: By dropping or transforming dynamic labels that change frequently with every new pod or container, you reduce the number of unique label sets Prometheus needs to track.
- Problem: Automatically scraping Kubernetes metadata like
-
Avoiding Dynamic Labels in Application Metrics:
- Problem: Applications often expose metrics with labels that are dynamic or have high cardinality, like user IDs, request IDs, or specific resource identifiers.
- Diagnosis: Identify metrics with high label value counts using
promtoolor PromQL queries. For example, ifrequest_idis a label on many metrics, it’s a prime suspect. - Fix: Modify your application’s metrics exposition to either:
- Remove the high-cardinality label entirely.
- Replace it with a more static or aggregated identifier.
- Use a "summary" metric (which has lower cardinality than "histograms" with many buckets) if appropriate, but be mindful of their own performance implications.
// Example of problematic metric in application code httpRequestsTotal.WithLabelValues("GET", "/users/" + userID, "200").Inc() // userID can be high cardinality // Better approach: aggregate or remove // Option 1: Aggregate by user type or role if possible httpRequestsTotal.WithLabelValues("GET", "/users/:id", "200").Inc() // Option 2: Remove the dynamic part if not essential for aggregation httpRequestsTotal.WithLabelValues("GET", "/users", "200").Inc() - Why it works: Removing or aggregating highly granular, dynamic label values directly at the source prevents them from being sent to Prometheus in the first place, drastically reducing the number of unique time series.
-
Using
labeldropandlabelkeepin Scrape Configs:- Problem: Even if you don’t want to drop entire
__metalabels, you might want to remove specific dynamic labels that are attached to your metrics after the scrape. - Diagnosis: After relabeling, if you still see high cardinality from certain labels that were not
__metalabels but were dynamically generated or attached. - Fix: Use
labeldroporlabelkeepin yourscrape_configs.scrape_configs: - job_name: 'my-app-metrics' static_configs: - targets: ['app-service:8080'] # Drop labels that are not useful and add cardinality metric_relabel_configs: - source_labels: [request_id] action: drop - source_labels: [user_session_id] action: drop # Or, keep only a specific set of labels # metric_relabel_configs: # - action: labelkeep # regex: (job|instance|app|version) - Why it works: These
metric_relabel_configsoperate on the metrics after they’ve been scraped but before they are ingested into the TSDB. They allow you to prune unwanted labels, thereby reducing cardinality.
- Problem: Even if you don’t want to drop entire
-
Aggregating Metrics at the Exporter Level:
- Problem: Some exporters (e.g.,
node_exporter) expose a vast number of metrics with very detailed labels by default. - Diagnosis: Use
promtoolto see if metrics from specific exporters (likenode_exporter) have an overwhelming number of series due to their detailed label sets. - Fix: Configure the exporter to disable or limit the metrics that generate high cardinality. For
node_exporter, you can use the--collector.<name>flags to disable collectors, or--no-collector.<name>to disable specific sub-collectors. For example, to disable thetextfilecollector which can generate many unique metrics from files:
Or, if you are using the./node_exporter --no-collector.textfiletextfilecollector but want to limit its scope:
This is less about dropping labels and more about controlling the source of metrics.# In /etc/node_exporter/textfile-collector/ # Create files like /etc/node_exporter/textfile-collector/my_metric.prom # Content: my_custom_metric{label="value"} 123 - Why it works: By choosing which metrics collectors are active on an exporter, or by carefully crafting the metrics exposition from custom collectors, you can prevent high-cardinality metrics from ever being generated and sent to Prometheus.
- Problem: Some exporters (e.g.,
-
Leveraging Service Discovery for Static Labels:
- Problem: You might need to identify which application instance or host a metric came from, but using dynamic labels like
pod_nameorcontainer_nameis too high cardinality. - Diagnosis: You’re seeing high cardinality from labels that you feel should be static identifiers.
- Fix: Use service discovery (like Kubernetes SD or Consul SD) to inject more stable identifiers. You can then use
relabel_configsto map these stable identifiers to your metric labels. For example, mapping a Kubernetes Deployment name to a metric label.scrape_configs: - job_name: 'kubernetes-deployments' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_deployment_name] target_label: deployment - source_labels: [__meta_kubernetes_namespace] target_label: namespace # Drop the highly dynamic pod name if it's not needed - source_labels: [__meta_kubernetes_pod_name] action: drop - Why it works: By using more stable identifiers provided by service discovery, like deployment names or service names, you get meaningful labels without the extreme uniqueness that pod or container names provide.
- Problem: You might need to identify which application instance or host a metric came from, but using dynamic labels like
-
Consider Remote Write and Downsampling:
- Problem: You have legitimate use cases for high-cardinality data (e.g., debugging a specific request), but you can’t afford to keep it all long-term.
- Diagnosis: You’ve tried all other methods, and you still have a significant number of high-cardinality metrics that are essential for short-term analysis but not for long-term trending.
- Fix: Configure Prometheus to use
remote_writeto send data to a long-term storage solution (like Thanos, Cortex, VictoriaMetrics, or Mimir) that supports downsampling. Configure this remote storage to aggregate or downsample metrics after a certain period (e.g., keep 1-minute resolution for a day, then 5-minute resolution for a month, then hourly for a year).remote_write: - url: "http://your-long-term-storage:9201/api/v1/push" # Optional: filter metrics to send to remote write if needed # write_relabel_configs: # - source_labels: [__name__] # regex: "metric_i_want_to_keep_long_term" # action: keep - Why it works: Prometheus itself remains lean by only storing recent, high-resolution data. The remote storage handles the long-term, potentially lower-resolution, aggregated data, solving both storage and query performance issues for historical data.
The next error you’ll likely encounter after fixing a cardinality explosion is a performance degradation in query execution time for metrics that were previously unaffected, as the remaining index still needs to be scanned.