Prometheus silences and inhibitions aren’t just "mute buttons" for alerts; they’re sophisticated tools for managing alert fatigue by selectively suppressing notifications based on specific, dynamic conditions.

Let’s watch Prometheus in action. Imagine we have an alert rule that fires when a web server’s 5xx error rate exceeds 1%.

- alert: HighHttp5xxRate
  expr: |
    sum(rate(http_requests_total{code=~"5..", namespace="web"}[5m]))
    /
    sum(rate(http_requests_total{namespace="web"}[5m]))
    * 100 > 1
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "High 5xx error rate for web servers"

    description: "The 5xx error rate for namespace 'web' is {{ $value | printf \"%.2f\" }}%."

This alert will fire if the condition persists for 5 minutes. Now, what if we know that during our weekly maintenance window, a higher-than-usual 5xx rate is expected due to planned deployments? We don’t want to be bombarded with alerts then. This is where silences and inhibitions come in.

Silences: The Scheduled Mute

A silence is a declarative statement that says, "For this specific time period, ignore alerts matching these criteria." It’s like setting an out-of-office reply for your alerts.

How it works: You define a time range, and importantly, a set of matchers. Alerts that have all the labels specified in the silence’s matchers will be suppressed.

Example: Let’s create a silence for our weekly maintenance.

UI: Navigate to "Alerts" -> "Silences" -> "New Silence".

Configuration:

  • Starts: 2023-10-27 02:00:00 UTC
  • Ends: 2023-10-27 04:00:00 UTC
  • Created by: your_name
  • Comment: Weekly maintenance window for web services.
  • Matchers:
    • alertname = HighHttp5xxRate
    • namespace = web

Why it works: When the silence is active, Prometheus’s Alertmanager checks incoming alerts. If an alert’s labels exactly match the defined matchers (alertname is HighHttp5xxRate AND namespace is web), and the current time falls within the silence’s start and end times, the alert is not sent to any configured receivers. The alert is still generated by Prometheus, but Alertmanager filters it out before it reaches your Slack, PagerDuty, etc.

Inhibitions: The Conditional Suppression

Inhibitions are more dynamic. They allow you to suppress alerts based on the state of other alerts. The core idea is: "If alert A is already firing, don’t bother me with alert B."

How it works: You define an inhibition rule that specifies which alerts should be inhibited (target_matchers) and under what conditions (source_matchers). The rule is active when an alert matching source_matchers is firing, and it suppresses alerts matching target_matchers if they share at least one label value with the source alert.

Example: Suppose we have a critical alert that fires when the entire web service is down, and we also have the HighHttp5xxRate alert. We don’t want to be notified about high 5xx rates if the whole service is already down and we’re dealing with the more severe "service down" alert.

Alertmanager Configuration (alertmanager.yml):

route:
  receiver: 'default-receiver'
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  routes:
  - receiver: 'critical-receiver'
    matchers:
    - severity="critical"
    # ... other routes

receivers:
- name: 'default-receiver'
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/...'
    channel: '#alerts-general'

- name: 'critical-receiver'
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/...'
    channel: '#alerts-critical'

inhibit_rules:
- target_matchers:
  - alertname="HighHttp5xxRate"
  source_matchers:
  - alertname="WebServiceDown"

Why it works: When the WebServiceDown alert fires (matching source_matchers), Alertmanager checks if any other alerts are also firing and share at least one label value with WebServiceDown. In our example, if HighHttp5xxRate is also firing and has, say, namespace: web, and the WebServiceDown alert also has namespace: web, then HighHttp5xxRate will be inhibited. This prevents us from getting a flood of "high error rate" alerts when the overarching problem is the entire service being down. The inhibition rule effectively says, "If WebServiceDown is active for namespace: web, then suppress HighHttp5xxRate for namespace: web."

The real power of inhibitions lies in their ability to create sophisticated alert dependency graphs. You can chain them: if a cluster is overloaded, suppress all application-level alerts within that cluster. If a database is unreachable, suppress alerts about applications that depend on it. This dramatically reduces alert noise during widespread incidents.

A common pitfall with silences is forgetting to set an end time, leaving alerts permanently muted. For inhibitions, it’s ensuring your source_matchers and target_matchers are precise enough, and that there’s a common label value for the inhibition to bind to.

The next step after mastering silences and inhibitions is understanding how to group related alerts together using group_by in Alertmanager’s routing, so that multiple instances of the same alert type are bundled into a single notification.

Want structured learning?

Take the full Prometheus course →