Linux perf can tell you about page faults, but it’s not just about counting them; it’s about understanding why they’re happening and how much they’re costing you.
Let’s see perf in action. Imagine we have a simple C program that repeatedly accesses an array, potentially causing page faults if the array is large or the memory access pattern is pathological.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define ARRAY_SIZE (1024 * 1024 * 1024) // 1GB array
int main() {
long long *arr = (long long *)malloc(ARRAY_SIZE);
if (!arr) {
perror("malloc failed");
return 1;
}
printf("Accessing array...\n");
for (long long i = 0; i < ARRAY_SIZE / sizeof(long long); ++i) {
arr[i] = i; // Accessing elements
if (i % (1024 * 1024 * 100) == 0) { // Print progress every 100MB
printf("Processed %lld MB\n", i * sizeof(long long) / (1024 * 1024));
usleep(100000); // Small delay to make output visible
}
}
printf("Array access complete.\n");
free(arr);
return 0;
}
We can compile this with gcc -o pagefault_test pagefault_test.c.
Now, to measure page faults using perf, we’ll use the software event page-faults.
perf stat -e page-faults ./pagefault_test
Running this will give output similar to:
Performance counter stats for './pagefault_test':
1,234,567 page-faults (83.33%)
0 context-switches (83.33%)
0 cpu-migrations (83.33%)
0 page-faults (software event, not a performance counter)
0 major-page-faults (83.33%)
0 minor-page-faults (83.33%)
3.141592653 seconds time elapsed
Notice the page-faults count. This is the raw number of times the system had to handle a page fault for this process. But what does this mean? It means that when the program tried to access a memory address that wasn’t currently mapped into physical RAM (a page), the CPU triggered an exception. The operating system’s page fault handler then stepped in. For a minor page fault, the OS found the required page in its cache (e.g., page cache from a file read or a shared library already loaded) and simply mapped it into the process’s address space. For a major page fault, the OS had to fetch the page from secondary storage (like a hard drive or SSD), which is significantly more expensive.
The beauty of perf is its flexibility. While page-faults is a good start, we can drill down further. The event major-faults specifically counts the costly ones, and minor-faults counts the less costly ones.
perf stat -e major-faults,minor-faults ./pagefault_test
This would give us a breakdown:
Performance counter stats for './pagefault_test':
100,000 major-faults (83.33%)
1,134,567 minor-faults (83.33%)
3.141592653 seconds time elapsed
This distinction is crucial. A high number of minor-faults might indicate inefficient memory access patterns or a program that’s constantly loading new code/data, but it’s usually not a performance bottleneck unless the rate is extremely high. A high number of major-faults, however, almost always points to a significant I/O bottleneck. The program is spending a lot of time waiting for data to be read from disk.
The system’s handling of page faults is a core part of its virtual memory management. When a process attempts to access a virtual memory address, the Memory Management Unit (MMU) checks its page tables to see if a physical page frame is mapped to that address. If not, a page fault exception is raised. The kernel’s page fault handler then determines the cause: was the page swapped out? Does it need to be read from a file? Is it a copy-on-write page? Based on this, it either finds the page in memory (minor fault) or fetches it from disk (major fault), updates the page tables, and resumes the process.
What most people miss is how perf can link these software events to hardware. While page-faults itself is a software concept, the cost of handling them is measured in CPU cycles and I/O latency. By combining perf stat with hardware events like cpu-cycles or stalled-cycles-frontend, you can start to quantify the actual performance impact. For instance, a high page-faults count alongside a high stalled-cycles-frontend might indicate that the CPU is spending a lot of time waiting for memory to be populated due to faults.
The next step after understanding page faults is often investigating cache misses, which are closely related to memory access patterns and can also be tracked with perf.