Prometheus Agent Mode’s "write-only forwarding" isn’t about sending data to Prometheus; it’s about becoming a Prometheus instance that only accepts data and immediately forwards it.

Let’s see it in action. Imagine a cluster of application servers, each running a Prometheus Agent. These agents are configured to scrape metrics from local applications and then, instead of storing them, forward them to a central Prometheus Server or a compatible remote write endpoint like VictoriaMetrics.

# prometheus.yml for an agent node
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'my_app'
    static_configs:
      - targets: ['localhost:9091'] # Scrape metrics from a local app

remote_write:
  - url: "http://central-prometheus.example.com:9090/api/v1/write" # Forward to central server
    queue_config:
      max_shards: 10
      capacity: 500

This setup is designed to offload the scraping and initial ingestion burden from your central Prometheus instance. The agents do the heavy lifting of connecting to targets, pulling metrics, and then batching them up for efficient transmission. This is particularly useful in large, distributed environments where agents can be deployed closer to the data sources, reducing network hops and load on the central Prometheus.

The core problem this solves is scalability and resilience. By distributing the scraping load, your central Prometheus server can focus on querying, alerting, and long-term storage (if configured separately). Agents act as intelligent forwarders, buffering data during temporary network outages between themselves and the central server, ensuring no data loss.

Internally, the agent mode leverages Prometheus’s remote_write protocol. When an agent scrapes metrics, it processes them through its own scrape pipeline (including relabeling, if configured) and then immediately sends them to the url specified in the remote_write section. It doesn’t maintain a local TSDB for long-term storage or querying. The queue_config is crucial here; it dictates how the agent buffers metrics in memory before sending them. max_shards distributes the outgoing data across multiple parallel connections to the remote endpoint, and capacity defines the maximum number of samples that can be held in the buffer for each shard.

The primary lever you control is the remote_write configuration. The url is self-explanatory. The queue_config allows tuning the buffering behavior. Higher capacity means more resilience against temporary network disruptions but also higher memory usage on the agent. max_shards can improve throughput to the remote endpoint if it supports parallel writes. You can also configure authentication (e.g., basic_auth, tls_config) within the remote_write block to secure the connection to your central Prometheus or compatible endpoint.

What most people miss is that remote_write in agent mode is a one-way street. The agent cannot be queried directly for historical data. Its sole purpose is to collect and forward. If you try to scrape an agent node’s /api/v1/query endpoint, you’ll get an error because it doesn’t have the necessary TSDB components enabled.

The next step you’ll likely encounter is configuring the receiving endpoint (your central Prometheus or VictoriaMetrics) to properly ingest and potentially store this forwarded data.

Want structured learning?

Take the full Prometheus course →