Prometheus doesn’t have a "heap" in the traditional garbage-collected sense, but it does consume significant Resident Set Size (RSS) memory, and understanding why is key to taming it.

Let’s see Prometheus in action, specifically how it stores and queries data. Imagine you have a cluster of 100 nodes, each scraping 20 targets every 15 seconds, and each target exposes about 50 metrics.

# Example scrape configuration
scrape_configs:
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['node1:9100', 'node2:9100', ...] # 100 nodes
    metric_relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):9100'
        target_label: instance
        replacement: '$1'

# Example query
http GET http://localhost:9090/api/v1/query \
  range_end=$(date +%s) \
  range_start=$(date +%s -d "5 minutes ago") \
  step=15s \
  query='up{job="node_exporter"}'

When Prometheus scrapes metrics, it doesn’t just store raw text. It parses these metrics, assigns them unique internal IDs, and stores them in an optimized time-series database (TSDB) format on disk. The RSS memory you see is primarily used for caching these time-series blocks, indexing data, and managing the scrape process itself.

The core problem Prometheus solves is ingesting, storing, and querying vast amounts of time-series data efficiently. It achieves this through a columnar TSDB format, head block caching, and an aggressive indexing strategy. The levers you control are primarily related to the amount of data it ingests and how it indexes that data.

The most surprising true thing about Prometheus memory usage is that it’s often less about the number of metrics and more about the cardinality of those metrics and the retention period. High cardinality means a huge number of unique label combinations for a given metric name. For example, http_requests_total{method="GET", handler="/api", instance="192.168.1.100:8080", user_id="user123"} has much higher cardinality than node_cpu_seconds_total{mode="idle", instance="node1"}. Each unique combination is a distinct time series.

Here’s the system in action with a high-cardinality metric:

# prometheus.yml
scrape_configs:
  - job_name: 'my_app'
    static_configs:
      - targets: ['app_server1:8080']
    metric_relabel_configs:
      - source_labels: [request_id] # Imagine millions of unique request IDs
        target_label: request_id

When Prometheus scrapes http_requests_total with millions of unique request_id labels, it has to create and manage an index for each. This index can consume a significant amount of memory, especially if those time series are active for a long duration. The TSDB on disk will also grow proportionally.

The one thing most people don’t know is how much memory is consumed by the head block cache. Prometheus keeps recently scraped data in memory for fast querying before it’s flushed to disk. This in-memory block, often several hundred megabytes to gigabytes, is crucial for query performance. If you have many active time series with high scrape rates, this head block can grow substantially, leading to increased RSS. Reducing the storage.tsdb.max-block-duration can decrease the size of these blocks, but it also means data is flushed to disk more frequently, potentially impacting query latency for very recent data. A common mistake is to think only about disk storage when optimizing memory.

The next concept you’ll run into is understanding how Prometheus’s query engine interacts with this in-memory cache and the on-disk TSDB, and how query performance can be affected by data locality.

Want structured learning?

Take the full Prometheus course →