Prometheus’s Remote Write feature is fundamentally a streaming API, not a batch one, which is why backpressure is such a critical, often overlooked, tuning knob.
Let’s watch it in action. Imagine Prometheus scraping metrics from a few thousand targets, generating a constant stream of time series data. This data needs to be sent to a remote storage system (like Cortex, Thanos, Mimir, or a compatible endpoint) via Remote Write.
# prometheus.yml
remote_write:
- url: "http://my-remote-write-endpoint:9201/api/v1/push"
# This is where the magic happens for backpressure
queue_config:
capacity: 10000 # Number of samples the queue can hold
max_shards: 10 # Number of parallel HTTP connections
min_shards: 1
max_samples_per_send: 500 # Max samples in one HTTP POST request
max_time_per_send: "1s" # Max duration to batch samples before sending
batch_send_watermark: 0.5 # Start sending when queue is 50% full
send_timeout: "30s" # How long to wait for a successful send
When Prometheus generates metrics, they land in an in-memory queue managed by remote_write. This queue is segmented into max_shards parallel streams, each trying to send data to the remote endpoint. Each stream batches samples up to max_samples_per_send or max_time_per_send. If the remote endpoint is slow to acknowledge these batches, the queue fills up. This is backpressure. If the queue becomes completely full, Prometheus stops scraping new metrics until space frees up. This is the most impactful symptom of poorly tuned backpressure.
The primary problem remote_write backpressure solves is preventing Prometheus from overwhelming its remote storage. If the remote system can’t ingest data as fast as Prometheus generates it, metrics will be dropped, or worse, Prometheus itself will grind to a halt. Tuning remote_write effectively means finding the sweet spot between sending data promptly and not flooding the destination.
Here’s how the key parameters interact:
capacity: This is the total number of samples Prometheus will buffer for all remote write queues combined. If this fills up, scraping stops. A larger capacity means Prometheus can tolerate temporary spikes in ingestion rate or remote system latency without impacting scraping. However, it also means more memory usage.max_shards: This controls the number of concurrent HTTP connections Prometheus opens to the remote write endpoint. More shards mean more parallelism, which can increase throughput if the remote endpoint can handle it. Too many shards can overwhelm the remote endpoint or even exhaust local file descriptors on the Prometheus server.max_samples_per_sendandmax_time_per_send: These define how Prometheus batches samples into individual HTTP POST requests. Larger batches can improve efficiency for the remote system (fewer requests to process), but also increase latency for individual samples and can fill the queue faster if the remote system is slow to acknowledge large batches.send_timeout: This is crucial. If a batch of samples isn’t acknowledged by the remote endpoint within this time, the connection for that shard is considered failed, and Prometheus will retry. A short timeout can lead to excessive retries if the network is flaky or the remote endpoint is temporarily overloaded. A long timeout can mean Prometheus holds onto data for too long, increasing the risk of data loss if Prometheus crashes.batch_send_watermark: This is a percentage of the queue capacity. When the queue reaches this level, Prometheus will start aggressively sending batches, even ifmax_time_per_sendhasn’t been reached. This helps to "push back" against the queue filling up before it hits critical capacity.
The most effective way to tune remote_write is to monitor the Prometheus /api/v1/prom/remote/write/queue endpoint. This exposes metrics like prometheus_remote_storage_queue_head_samples, prometheus_remote_storage_queue_length, and prometheus_remote_storage_queue_sent_batches. If queue_length is consistently high or approaching capacity, you have backpressure.
Consider a scenario where your remote write endpoint is under-provisioned. You’ll see prometheus_remote_storage_queue_length growing. The first instinct might be to increase capacity. This will hide the problem for longer but doesn’t fix the root cause of the remote endpoint being too slow. Instead, you should try increasing max_shards to parallelize the load if your remote endpoint can handle more concurrent connections. If that doesn’t help, you might need to reduce max_samples_per_send and max_time_per_send to send smaller batches, which can sometimes be acknowledged faster by a struggling endpoint, effectively giving it smaller, more manageable chunks.
The one thing most people don’t realize is that send_timeout is not just about how long Prometheus waits; it’s also a signal to the remote system. If the remote system is seeing frequent send_timeout errors, it indicates it’s not acknowledging batches fast enough, and Prometheus will effectively back off and retry, potentially leading to a cascade of delays. Conversely, if Prometheus is constantly retrying because send_timeout is too short for a slow but functional remote, you’re burning CPU and network resources unnecessarily.
The next logical step after tuning remote write backpressure is understanding how your remote storage system itself handles ingestion rate limits and potential data loss scenarios.