The perf tool on Linux is an incredibly powerful, low-overhead profiler, but using it directly on Kubernetes pods requires a few tricks to bridge the gap between the host kernel and the isolated container environment.

Let’s see perf in action, profiling a busy Nginx pod.

First, we need a sample Nginx deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-profiler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

Apply this to your cluster: kubectl apply -f nginx-deployment.yaml

Now, the trick: perf needs to run on the host but needs to see the container’s process tree and kernel events. We achieve this by using nsenter to enter the target pod’s namespaces.

Find the pod name: kubectl get pods -l app=nginx (let’s assume it’s nginx-profiler-xxxxx-yyyyy).

Find the pod’s PID on the host node. This is the crucial link. You can often find this by looking in /var/run/docker/runc/<container_id>/pid or similar paths depending on your container runtime. A more reliable way is via kubectl debug node/<node_name> -it --image=ubuntu -- bash and then docker ps or crictl ps on the node to find the container ID and its PID. Let’s assume the PID on the host is 12345.

Now, we use nsenter to run perf within the pod’s namespaces. The key flags are -t for the target PID and -n for network, -m for mount, -u for UTS, -i for IPC, -p for PID, and -U for user namespaces.

# On the Kubernetes node where the pod is running
sudo nsenter -t 12345 -n -m -u -i -p -U perf top

This command will drop you into a perf top view, showing you what’s consuming CPU cycles within that specific pod, as seen from the host. You’ll see the Nginx worker processes, and any other processes running inside the container.

The core problem perf faces in Kubernetes is namespace isolation. When you run perf directly on a node, it sees all processes and kernel events on that node. However, a container’s processes are isolated within their own PID, network, mount, and other namespaces. perf needs to be aware of these isolated namespaces to correctly attribute events to processes inside the container. nsenter is the bridge; it allows a process on the host to enter the namespaces of another process (in this case, a container’s main process).

Here’s how you’d record a perf trace for later analysis:

# On the Kubernetes node
sudo nsenter -t 12345 -n -m -u -i -p -U perf record -o /tmp/nginx.perf.data -- sleep 60

This records perf data for 60 seconds, saving it to /tmp/nginx.perf.data on the host. You can then copy this file off the node and analyze it with perf report or flamegraph.pl.

The mental model for profiling containers on Kubernetes is:

  1. Identify the Target: Know which pod and container you want to profile.
  2. Find the Host PID: Locate the main process ID (PID) of the container’s init process on the Kubernetes node. This is the crucial lookup.
  3. Enter Namespaces: Use nsenter with appropriate namespace flags (-n, -m, -u, -i, -p, -U) to make the host process (like perf) aware of the container’s isolated environment.
  4. Profile: Run perf commands (perf top, perf record, etc.) within the nsenter context.
  5. Analyze: Collect the data (if using perf record) and analyze it on the host or your local machine.

The most surprising thing is how permissive perf needs to be by default. To capture kernel-level events accurately, perf often requires CAP_SYS_ADMIN or equivalent privileges, which is why you’ll typically run sudo nsenter ... perf .... Without these, it can only see userspace events for processes it has permission to see, which is usually limited to processes running as the same user on the host.

The next problem you’ll likely encounter is the need to profile specific kernel events or tracepoints that are not enabled by default or require deeper kernel interaction.

Want structured learning?

Take the full Perf course →