Prometheus cardinality is a performance bottleneck that happens when you have too many unique time series, often because of overly dynamic labels.
Let’s see this in action with a common culprit: Kubernetes pod metrics.
Imagine you’re scraping pods in a large Kubernetes cluster. By default, Prometheus might collect metrics like kube_pod_info with labels derived directly from Kubernetes objects. If you have a service that rapidly creates and destroys pods, each with a unique pod name, namespace, and potentially other dynamic labels like uid or container_id, you can blow up your cardinality.
Here’s a snippet of what a high-cardinality metric might look like in Prometheus’s internal storage (this is conceptual, not a direct query):
{
__name__="kube_pod_info",
container="my-app-container",
endpoint="http",
instance="10.244.0.5:8080",
job="kubernetes-pods",
namespace="default",
pod="my-app-1a2b3c4d5e-f6g7h",
release="my-release",
service="my-app-service",
uid="a1b2c3d4-e5f6-7890-1234-567890abcdef"
}
Now, if that my-app-1a2b3c4d5e-f6g7h pod name is generated dynamically and changes with every deployment or even scaling event, and you have thousands of such pods, you’re creating thousands of unique time series for just this one metric. Multiply that by other metrics and other dynamic labels, and your Prometheus server starts struggling.
The Problem: High cardinality means Prometheus has to store, index, and query an enormous number of unique time series. This consumes excessive RAM and CPU, leading to slow queries, ingestion delays, and even Prometheus crashes. The core issue is that Prometheus’s time series database is optimized for aggregation and range queries, not for unique identifiers that change constantly.
How it Works Internally: Prometheus stores data in blocks. Each unique combination of metric name and label set forms a distinct time series. When a new time series is ingested, Prometheus must:
- Check for existence: Does this exact series already exist in the index?
- Create if new: If not, it creates a new entry in its internal index and starts writing data points.
- Index: All label key-value pairs are indexed for efficient querying.
With high cardinality, step 1 becomes a massive lookup operation, and step 2 involves creating millions of new index entries. Querying also becomes a nightmare as Prometheus has to scan through an overwhelming number of series to find matches.
The Levers You Control:
- Scrape Config (
prometheus.yml): This is your first line of defense. You can userelabel_configsto manipulate labels before they hit Prometheus’s storage. - Service Discovery: How you discover targets (e.g., Kubernetes service discovery) often dictates the initial labels.
- Application Instrumentation: How your applications expose metrics can also contribute. Overly granular labels in your application code can propagate.
Reducing High-Cardinality Series:
-
relabel_configsto Drop or Relabel Dynamic Labels: This is the most common and effective method. You can remove labels that are too dynamic or replace them with static values.-
Diagnosis: Check your Prometheus UI under "Status" -> "TSDB status" or "Status" -> "Runtime & Build Information" to see series count. Then, use the "Query" tab to run queries like
count({__name__=~".+"}) by (__name__)to see which metrics have the highest series counts. Usecount by (<label_name>)for specific labels. -
Fix: In your
prometheus.yml, add or modifyrelabel_configswithin your scrape job.scrape_configs: - job_name: 'kubernetes-pods' # ... your other config ... relabel_configs: # Example: Drop the 'uid' label if it's too dynamic - source_labels: [uid] action: drop # Example: Relabel 'pod' to a static value if you only care about the service, not individual pods # This is aggressive and depends on your use case. - source_labels: [pod, service] regex: '(.+);(.+)' target_label: pod replacement: '${2}' # Use the service name instead of the dynamic pod name # Example: Keep only specific, static labels for easier querying - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name] target_label: app action: keep - source_labels: [__meta_kubernetes_namespace] target_label: namespace action: keep # Drop any other labels that aren't explicitly kept or relabeled - action: prune all_except: - namespace - pod - app -
Why it works:
action: dropcompletely removes the label from the time series before ingestion.replacementallows you to substitute a dynamic label with a more static or aggregated value.action: prunewithall_exceptis a powerful way to clean up and ensure only desired labels are kept.
-
-
Using
metric_relabel_configsto Drop Entire Metrics: If a metric is inherently high-cardinality and not useful, drop it.- Fix:
scrape_configs: - job_name: 'kubernetes-pods' # ... metric_relabel_configs: - source_labels: [__name__] regex: 'kube_pod_container_status_last_terminated_reason' # Example of a metric to drop action: drop - Why it works: This prevents the metric from ever being stored if it’s not needed.
- Fix:
-
Leverage Prometheus’s
honor_labels: If you have multiple scrape targets that are supposed to represent the same logical entity and you want to avoid creating duplicate series due to differentinstancelabels,honor_labelsis key.- Fix:
scrape_configs: - job_name: 'kubernetes-pods-per-node' # ... honor_labels: true relabel_configs: # ... your relabeling to ensure consistent labels across pods on the same node ... - Why it works: When
honor_labelsistrue, Prometheus will not add or modify labels that are already present on the target and match thehonor_labelslist (which defaults to all labels present on the target). This prevents Prometheus from adding its owninstancelabel if the target already provides a stable identifier.
- Fix:
-
Adjusting Kubernetes Service Discovery Configuration: For Kubernetes, you can fine-tune how Prometheus discovers pods and services.
-
Fix: In your
prometheus.yml’skubernetes_sd_configs, you can userole: podand then applyrelabel_configsto filter or modify labels derived from pod annotations or labels.scrape_configs: - job_name: 'kubernetes-pods-filtered' kubernetes_sd_configs: - role: pod # Example: Only scrape pods with a specific annotation # filter_out_annotations: # - "prometheus.io/scrape=false" relabel_configs: # Example: Keep only pods in specific namespaces - source_labels: [__meta_kubernetes_namespace] regex: '^(monitoring|production)$' action: keep # Example: Drop pods that are not running - source_labels: [__meta_kubernetes_pod_container_status_running] regex: 'false' action: drop -
Why it works: This allows you to be more selective about what Prometheus discovers and scrapes in the first place, reducing the raw number of potential targets and their associated labels.
-
-
Use Labels Wisely in Application Code: When instrumenting your own applications, be mindful of the cardinality introduced by your custom metrics.
-
Fix: Review your application’s metrics. Instead of using a
pod_nameorrequest_idthat changes per request, consider using more static labels likeservice_name,endpoint_type, orenvironment. If you need to track individual requests, consider using tracing systems (like Jaeger or Tempo) or logging, rather than high-cardinality Prometheus metrics. -
Why it works: By reducing the dynamism at the source, you prevent the explosion of series from ever happening.
-
-
External Label Management: For very large environments, consider offloading some metrics or using a Prometheus federation model where an upstream Prometheus aggregates data from downstream ones, applying relabeling at the federation layer.
-
Fix: Configure
remote_writeandremote_readendpoints between Prometheus instances, or userelabel_configson the federating Prometheus. -
Why it works: This distributes the load and allows for centralized control over label management before data reaches a single, massive Prometheus instance.
-
The next error you’ll likely encounter is context deadline exceeded on scrape targets as Prometheus struggles to keep up with its own internal state, even before ingestion errors occur.