Prometheus’s HTTP service discovery is a powerful, yet often misunderstood, mechanism for dynamically finding targets to scrape.

Let’s see it in action. Imagine you have a fleet of microservices, each exposing a /metrics endpoint. You want Prometheus to automatically find and scrape all of them.

Here’s a snippet from a Prometheus prometheus.yml configuration:

scrape_configs:
  - job_name: 'my-microservices'
    metrics_path: /metrics
    scheme: http
    dns_sd_configs:
      - names:
          - 'my-service.internal.svc.cluster.local'
        type: SRV
        port: 8080
    relabel_configs:
      - source_labels: [__address__]
        regex: '(.+):8080'
        target_label: __address__
        replacement: '$1:80' # Assuming metrics are on port 80
      - source_labels: [__meta_dns_srv_name]
        target_label: instance

In this example, Prometheus is configured to use DNS SRV records to discover instances of my-service.internal.svc.cluster.local. It expects these records to point to the service’s actual IP addresses and ports.

The dns_sd_configs block tells Prometheus how to find the services.

  • names: This is a list of DNS names to query.
  • type: Specifies the DNS record type to look for. SRV is common for service discovery, but A or AAAA can also be used.
  • port: If using SRV records, this is the default port to use if the SRV record doesn’t specify one.

The relabel_configs block is crucial for transforming the discovered information into what Prometheus needs for scraping.

  • The first relabel_configs rule takes the discovered __address__ (which might be service-ip:8080 from the SRV record) and rewrites it to service-ip:80, assuming the actual /metrics endpoint is exposed on port 80.
  • The second rule takes the DNS SRV name (__meta_dns_srv_name) and uses it to set the instance label, which is a human-readable identifier for the target.

The system works by Prometheus periodically querying the specified DNS records. When these records change (e.g., a new service instance is deployed, or an existing one fails), Prometheus automatically updates its list of targets without needing a restart. This dynamic nature is key to managing ephemeral cloud-native environments.

The most surprising thing is how much control you have over the labels generated from DNS. You can map __meta_dns_a, __meta_dns_aaaa, __meta_dns_srv_name, __meta_dns_srv_priority, and __meta_dns_srv_weight to any Prometheus label you desire, effectively enriching your metrics with service discovery metadata.

The next step is often integrating this with more complex service meshes or understanding how to use file_sd_configs for static or manually managed targets.

Want structured learning?

Take the full Prometheus course →