Prometheus doesn’t actually discover Kubernetes pods; it discovers Kubernetes Services and relies on those Services to tell it which pods are ready to serve traffic.

Here’s a simple Nginx deployment and service, and how Prometheus will find it:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  labels:
    app: nginx
spec:
  selector:
    app: nginx
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80

When Prometheus is running inside your Kubernetes cluster and configured with the kubernetes_sd_config and kubernetes relabeling, it queries the Kubernetes API server. It watches for changes to Service objects. When it sees the nginx-service above, it gets a list of pods that match the selector (app: nginx). For each of those pods, it creates a target with labels like __meta_kubernetes_service_name: nginx-service, __meta_kubernetes_pod_name: nginx-deployment-xxxx, and crucially, __address__: <pod-ip>:<service-port>.

The default kubernetes relabeling rule then takes __meta_kubernetes_service_name and __meta_kubernetes_endpoint_port_name (if applicable) to construct the final job label, and it uses __address__ as the scrape target. The result is Prometheus scraping each of the nginx-deployment-xxxx pods on port 80, with a job name derived from the service.

The most surprising true thing about Prometheus Kubernetes Service Discovery is that it doesn’t directly discover pods at all; it discovers Kubernetes Services and then uses those Services to find the pods. Prometheus relies on the selector field within a Service definition to identify which pods are associated with that Service. Only pods that are selected by a Service will be considered for scraping by Prometheus when using the kubernetes_sd_configs module with the service role.

Let’s see this in action with a Prometheus configuration snippet. Assume Prometheus is running in-cluster with a Service Account that has list and watch permissions for services, endpoints, pods, and nodes.

scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      # Keep only pods that have Prometheus annotations for scraping
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      # Use pod IP as the scrape address
      - source_labels: [__address__]
        action: replace
        target_label: __address__
        regex: ([^:]+)(:[0-9]+)?
        replacement: ${1}:9090 # Assuming your pod metrics are on port 9090
      # Set the job name from the pod's namespace and app label
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_label_app]
        action: replace
        target_label: job
        regex: (.*?);(.*)
        replacement: ${1}/${2}
      # Set the instance label to the pod name
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: instance

In this configuration:

  • job_name: 'kubernetes-pods' is a descriptive name for this scrape job.
  • kubernetes_sd_configs with role: pod tells Prometheus to watch for Pod objects in Kubernetes.
  • The relabel_configs section is where the magic happens to filter and transform the discovered targets.
    • The first rule keeps only those pods that have the annotation prometheus.io/scrape: "true". This is how you opt-in specific pods for Prometheus to scrape.
    • The second rule replaces the __address__ meta-label. It extracts the pod’s IP address and sets the port to 9090. This assumes your application inside the pod exposes metrics on port 9090 and that this port is specified in the prometheus.io/port annotation (if used, otherwise __address__ might default to the first container port).
    • The third rule constructs the job label. It concatenates the Kubernetes namespace and the app label from the pod, separated by a slash. So a pod with label app: my-app in namespace production would get a job name like production/my-app.
    • The fourth rule sets the instance label to the name of the pod (__meta_kubernetes_pod_name).

Consider a pod with these labels and annotations:

apiVersion: v1
kind: Pod
metadata:
  name: my-app-pod-12345
  namespace: default
  labels:
    app: my-app
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
    prometheus.io/path: "/metrics" # Optional, defaults to /metrics
spec:
  containers:
  - name: my-app-container
    image: my-app-image
    ports:
    - containerPort: 8080 # This port is ignored if prometheus.io/port is set
    - containerPort: 9090 # This is the port Prometheus will use

With the Prometheus configuration above, this pod would be discovered as a target with:

  • __address__: <pod-ip>:9090
  • job: default/my-app
  • instance: my-app-pod-12345
  • __meta_kubernetes_pod_annotation_prometheus_io_scrape: true
  • __meta_kubernetes_pod_annotation_prometheus_io_port: 9090

The kubernetes_sd_configs with role: pod is powerful because it allows Prometheus to discover and scrape metrics directly from individual pods without needing to define a Kubernetes Service for every single application that exposes metrics. This is particularly useful for ephemeral workloads or services that don’t require a stable external endpoint.

A common pitfall is forgetting to add the prometheus.io/scrape: "true" annotation to your pods. Without it, Prometheus will discover the pod but the relabeling rules will filter it out, meaning it will never be scraped. Another common issue is a mismatch between the port specified in prometheus.io/port and the actual port your application is listening on for metrics.

The next concept you’ll likely encounter is how to scrape metrics from nodes themselves, using role: node in your kubernetes_sd_configs, and how to integrate that with other discovery roles like role: service and role: endpoints.

Want structured learning?

Take the full Prometheus course →