I'm learning how to print the call stack in a program without a frame pointer. Currently, I know that we can use DWARF's eh_frame section in ELF files to perform stack unwinding, and I've successfully tried analyzing ELF files myself. However, I want to understand how to use perf_event_open to sample and unwind the call stack.
From what I've learned, our program can be written as:
attr.type=PERF_TYPE_SOFTWARE;
attr.config=PERF_COUNT_SW_CPU_CLOCK;
attr.sample_period=100000;
attr.sample_type=PERF_SAMPLE_CALLCHAIN;
int fd=perf_event_open(&attr,0,-1,-1,0);
...
But there's a problem: since perf writes data asynchronously, and we also read the sample data asynchronously, when the frame pointer doesn't exist, the Linux kernel only returns one stack frame (if the program has a frame pointer, there would be multiple stack frames). At this point, we want to use eh_frame to unwind the stack, but it's too late, as the program might have already exited.
So I used perf with the --call-graph dwarf option to see if it can print the call stack:
perf recor --call-graph dwarf d -p PID
perf report --call-graph --stdio -G
It correctly displays the call stack of my test program which was compiled with -fomit-frame-pointer. My question is, how does it do this? I tried to investigate perf's call by running:
# strace perf record --call-graph dwarf -p PID
perf_event_open({type=PERF_TYPE_SOFTWARE, size=0x88 /* PERF_ATTR_SIZE_??? */, config=PERF_COUNT_SW_DUMMY, sample_period=0, sample_type=0, read_format=0, watermark=1, precise_ip=0 /* arbitrary skid */, sample_id_all=1, bpf_event=1,...}, 1, 63, -1, PERF_FLAG_FD_CLOEXEC) = 132
I found that it doesn't set sample_type to PERF_SAMPLE_CALLCHAIN, so how does it obtain the stack information?
I understand that perf uses libunwind to perform unwinding, but I'm not clear about the triggering mechanism. As I mentioned earlier, when the kernel only returns one PC (program counter) without a frame pointer, how does perf unwind the stack?
perf recordtalks aboutdwarfgrabbing snapshots of the stack (default 8192 bytes). So I assume it has the kernel copy these snapshots somewhere that user-space can use them later; given a program-counter, DWARF metadata, and a chunk of stack memory with stack-pointer at a known position, that should be enough to unwind.