1,136 questions
0
votes
0
answers
30
views
Profiling application using perf with clion docker toolchain
Clion offers a perf integration for profiling applications. Getting this to work when using a docker toolchain is not described online. Is it possible to do this, and how does one manage the different ...
2
votes
1
answer
75
views
How does Linux Perf manage hardware counters when profiling a multithreaded process?
I am trying to understand how Linux Perf manages hardware counters when profiling a multithreaded process.
According to the documentation, the perf_event_open syscall can be used in two ways:
To ...
0
votes
0
answers
25
views
Perf probe fails because of missing symbol, although its listed under functions (clang)
I'm trying to instrument clang with perf to mesure how much a certain llvm pass runs for (since -ftime-report seems to give inconsistent metrics and can't be used on multi file compilations). I have ...
2
votes
0
answers
120
views
L1d Cache miss and L1d cache ref counts are way off as shown by perf stat
Wrote an eBPF code to count cache refs and miss of a target process. The program seems to work, albeit the counts dont match even closely to the perf stat output. I am assuming there is some issue ...
1
vote
0
answers
32
views
Tracking Per Channel Memory Traffic in AMD Zen 2 (Rome)
I am using perf to profile workloads on my system, and I need to track the memory traffic generated by my workload on each NUMA node. Currently, I only have perf results for LLC cache misses, which ...
0
votes
1
answer
113
views
Why does PERF_COUNT_HW_REF_CPU_CYCLES have much higher variance on Zen5 cpus than PERF_COUNT_HW_CPU_CYCLES?
My understanding is that PERF_COUNT_HW_REF_CPU_CYCLES should map to some counter that counts at a constant rate, as opposed to PERF_COUNT_HW_CPU_CYCLES which is affected by frequency scaling. I'd ...
2
votes
0
answers
113
views
How can I identify kernel-level bottlenecks behind high %system and %softirq CPU usage?
I’m running a virtual machine on the host using QEMU/KVM, and I execute a specific workload inside the guest (e.g., TCP send).
When monitoring the CPU usage on the host with mpstat, I observe that ...
1
vote
0
answers
37
views
How does perf tool use eh_frame to unwind the call stack without frame pointer
I'm learning how to print the call stack in a program without a frame pointer. Currently, I know that we can use DWARF's eh_frame section in ELF files to perform stack unwinding, and I've successfully ...
0
votes
0
answers
47
views
Attaching to and receiving a process's own utrace events (dtrace USDT probe points)
Is it possible for a process to probe dtrace/perf/etc USDTs for its own process (or even better, process group or uid) without needing elevated privileges or being able to trace processes with other ...
2
votes
0
answers
68
views
Why perf complains that it cannot open this L1 cache event on Zen 2?
I am trying to read cache events on a AMD Zen2:
L1d all read accesses
L1d all write accesses
L1d read misses (not shown below)
L1d write misses (not shown below)
According to the perf_event_open(2) ...
0
votes
1
answer
92
views
What it the difference between perf kvm record with perf record?
Recently, I am profiling a workload with multiple kvm-based virtual machines (VMs). In short, I find that perf kvm record + perf kvm report shows different results with perf record + perf report.
...
0
votes
1
answer
118
views
Why the perf flame graph on qemu+tcg are lots of empty?
all
I use the perf to trace my qemu+tcg performance issue(attaching perf to tcg thread). I found that there is lots of empty in my flame-graph. So, why there is a lot of empty (the red circle in the ...
1
vote
1
answer
149
views
Detect if my program is being profiled with perf in Linux
I have a program that needs to emit some JIT symbols for perf. I would like it to only emit them when being profiled. As such, I'd like a way to detect that my application is being profiled under perf ...
-2
votes
1
answer
106
views
How to use perf inside docker container with CLion (pass --privileged flag to docker)?
i would like to profile my c++ app inside docker container. Perf needs container launched with --privileged flag but CLion doesn't have that option for docker plugin. Are there any ways to use CLion ...
0
votes
0
answers
179
views
perf stat for only the process I invoke with it
On Ubuntu, using:
sudo perf stat -e offcore_requests.demand_code_rd,offcore_requests.all_requests,l2_rqsts.all_demand_data_rd,mem_inst_retired.all_loads myApp
I'd like perf to return statistics only ...
0
votes
0
answers
39
views
How to use perf to find the heavy read and write parts of a program
I want to use perf to find out how many instructions will cause a large number of cache or memory accesses. Is there any good sampling method or usage method? My current sampling is
perf record -e ...
2
votes
0
answers
58
views
How to get dso infomations in linux perf call graph?
I want to find in perf's call graph which function of a certain library function is called by the corresponding source code function of the binary file, and get this data for parsing, but the call ...
4
votes
1
answer
160
views
What causes kernel memory operations in perf stats for an userspace-only process?
I'm running a simple program where:
A thread pinned to CPU 1 performs random reads from a pre-allocated and initialized 2GB memory region and no system calls are made during the memory access loop.
...
1
vote
0
answers
38
views
perf stat output difference with and without -e option
root@8ccbd6ec81f8:\~/my-bcc-image/cpusnoop# perf stat -e instructions ./a.out
Performance counter stats for './a.out':
48100736 instructions ...
0
votes
0
answers
166
views
How do I calculate the L3 cache miss rate and find number of trips to main memory using perf?
I'm trying to measure the L3 cache miss rate using the following formula:
I found that LLC misses can be obtained using this perf command from How to catch the L3-cache hits and misses by perf tool ...
2
votes
1
answer
195
views
Tracking DRAM traffic in AMD Zen 2 (Rome)
I want to track the number of read/write accesses at each of the Unified Memory Controllers (UMCs) in my AMD EPYC processor (family: 0x17 and model: 0x31). The AMDuProfPcm tool, when used with the -m ...
3
votes
1
answer
180
views
Analyzing Cache Behavior and Memory Traffic in Large File Reads Using perf stat
Consider the following code:
int main(int argc, char** argv) {
int buf_size = 1024*1024*1024;
char* buffer = malloc(buf_size);
char* buffer2 = malloc(buf_size);
for (int i = 0; i < 10; i++){...
1
vote
0
answers
13
views
How to distinguish files with the same path in a container using perf buildid-list
I run perf record -a -- sleep 3 on the host machine for sampling. There may be binaries with the same path but different buildids in different containers. For example, container A has /tmp/a.out, and ...
1
vote
1
answer
241
views
How do I prevent perf from writing to stdout when used in a pipe?
I am running two programs where one's stdout is piped into the other's stdin. I did this using named pipes in the following way (simplified for presentation):
$ mkfifo feedback
$ ./program1 < ...
0
votes
1
answer
129
views
How do I understand why a spawned thread takes that long to wake up?
I am comparing different implementations of a threadpool. One approach, containing locks and condition variables, is substantially (~100 microseconds) slower than another using a lock-free ...