Newest 'perf' Questions

0 votes

0 answers

30 views

Profiling application using perf with clion docker toolchain

Clion offers a perf integration for profiling applications. Getting this to work when using a docker toolchain is not described online. Is it possible to do this, and how does one manage the different ...

Jack

61

asked Nov 9 at 14:27

2 votes

1 answer

75 views

How does Linux Perf manage hardware counters when profiling a multithreaded process?

I am trying to understand how Linux Perf manages hardware counters when profiling a multithreaded process. According to the documentation, the perf_event_open syscall can be used in two ways: To ...

LorienLV

88

asked Oct 24 at 10:34

0 votes

0 answers

25 views

Perf probe fails because of missing symbol, although its listed under functions (clang)

I'm trying to instrument clang with perf to mesure how much a certain llvm pass runs for (since -ftime-report seems to give inconsistent metrics and can't be used on multi file compilations). I have ...

gillo04

148

asked Sep 18 at 7:03

2 votes

0 answers

120 views

L1d Cache miss and L1d cache ref counts are way off as shown by perf stat

Wrote an eBPF code to count cache refs and miss of a target process. The program seems to work, albeit the counts dont match even closely to the perf stat output. I am assuming there is some issue ...

ultimate cause

2,354

asked Aug 26 at 13:01

1 vote

0 answers

32 views

Tracking Per Channel Memory Traffic in AMD Zen 2 (Rome)

I am using perf to profile workloads on my system, and I need to track the memory traffic generated by my workload on each NUMA node. Currently, I only have perf results for LLC cache misses, which ...

smz

515

asked Aug 20 at 19:51

0 votes

1 answer

113 views

Why does PERF_COUNT_HW_REF_CPU_CYCLES have much higher variance on Zen5 cpus than PERF_COUNT_HW_CPU_CYCLES?

My understanding is that PERF_COUNT_HW_REF_CPU_CYCLES should map to some counter that counts at a constant rate, as opposed to PERF_COUNT_HW_CPU_CYCLES which is affected by frequency scaling. I'd ...

Joseph Garvin

22.3k

asked Jul 24 at 17:17

2 votes

0 answers

113 views

How can I identify kernel-level bottlenecks behind high %system and %softirq CPU usage?

I’m running a virtual machine on the host using QEMU/KVM, and I execute a specific workload inside the guest (e.g., TCP send). When monitoring the CPU usage on the host with mpstat, I observe that ...

choiyhking

43

asked Jun 13 at 2:14

1 vote

0 answers

37 views

How does perf tool use eh_frame to unwind the call stack without frame pointer

I'm learning how to print the call stack in a program without a frame pointer. Currently, I know that we can use DWARF's eh_frame section in ELF files to perform stack unwinding, and I've successfully ...

Nail Jay

285

asked May 14 at 12:52

0 votes

0 answers

47 views

Attaching to and receiving a process's own utrace events (dtrace USDT probe points)

Is it possible for a process to probe dtrace/perf/etc USDTs for its own process (or even better, process group or uid) without needing elevated privileges or being able to trace processes with other ...

Craig Ringer

329k

asked Apr 8 at 23:55

2 votes

0 answers

68 views

Why perf complains that it cannot open this L1 cache event on Zen 2?

I am trying to read cache events on a AMD Zen2: L1d all read accesses L1d all write accesses L1d read misses (not shown below) L1d write misses (not shown below) According to the perf_event_open(2) ...

onlycparra

815

asked Mar 20 at 5:02

0 votes

1 answer

92 views

What it the difference between perf kvm record with perf record?

Recently, I am profiling a workload with multiple kvm-based virtual machines (VMs). In short, I find that perf kvm record + perf kvm report shows different results with perf record + perf report. ...

huangjl

85

asked Mar 11 at 4:01

0 votes

1 answer

118 views

Why the perf flame graph on qemu+tcg are lots of empty?

all I use the perf to trace my qemu+tcg performance issue(attaching perf to tcg thread). I found that there is lots of empty in my flame-graph. So, why there is a lot of empty (the red circle in the ...

Jing

13

asked Mar 6 at 2:49

1 vote

1 answer

149 views

Detect if my program is being profiled with perf in Linux

I have a program that needs to emit some JIT symbols for perf. I would like it to only emit them when being profiled. As such, I'd like a way to detect that my application is being profiled under perf ...

Offtkp

470

asked Feb 26 at 21:17

-2 votes

1 answer

106 views

How to use perf inside docker container with CLion (pass --privileged flag to docker)?

i would like to profile my c++ app inside docker container. Perf needs container launched with --privileged flag but CLion doesn't have that option for docker plugin. Are there any ways to use CLion ...

dexnp1

1

asked Feb 2 at 18:10

0 votes

0 answers

179 views

perf stat for only the process I invoke with it

On Ubuntu, using: sudo perf stat -e offcore_requests.demand_code_rd,offcore_requests.all_requests,l2_rqsts.all_demand_data_rd,mem_inst_retired.all_loads myApp I'd like perf to return statistics only ...

jkang

559

asked Dec 17, 2024 at 23:41

0 votes

0 answers

39 views

How to use perf to find the heavy read and write parts of a program

I want to use perf to find out how many instructions will cause a large number of cache or memory accesses. Is there any good sampling method or usage method? My current sampling is perf record -e ...

HhhHa

1

asked Dec 3, 2024 at 3:20

2 votes

0 answers

58 views

How to get dso infomations in linux perf call graph?

I want to find in perf's call graph which function of a certain library function is called by the corresponding source code function of the binary file, and get this data for parsing, but the call ...

xiaoyu he

21

asked Nov 26, 2024 at 15:27

4 votes

1 answer

160 views

What causes kernel memory operations in perf stats for an userspace-only process?

I'm running a simple program where: A thread pinned to CPU 1 performs random reads from a pre-allocated and initialized 2GB memory region and no system calls are made during the memory access loop. ...

idle_cycles

173

asked Nov 14, 2024 at 21:45

1 vote

0 answers

38 views

perf stat output difference with and without -e option

root@8ccbd6ec81f8:\~/my-bcc-image/cpusnoop# perf stat -e instructions ./a.out Performance counter stats for './a.out': 48100736 instructions ...

BRATIN MONDAL

11

asked Oct 28, 2024 at 6:03

0 votes

0 answers

166 views

How do I calculate the L3 cache miss rate and find number of trips to main memory using perf?

I'm trying to measure the L3 cache miss rate using the following formula: I found that LLC misses can be obtained using this perf command from How to catch the L3-cache hits and misses by perf tool ...

Sherlock

63

asked Oct 17, 2024 at 14:08

2 votes

1 answer

195 views

Tracking DRAM traffic in AMD Zen 2 (Rome)

I want to track the number of read/write accesses at each of the Unified Memory Controllers (UMCs) in my AMD EPYC processor (family: 0x17 and model: 0x31). The AMDuProfPcm tool, when used with the -m ...

smz

515

asked Oct 4, 2024 at 18:37

3 votes

1 answer

180 views

Analyzing Cache Behavior and Memory Traffic in Large File Reads Using perf stat

Consider the following code: int main(int argc, char** argv) { int buf_size = 1024*1024*1024; char* buffer = malloc(buf_size); char* buffer2 = malloc(buf_size); for (int i = 0; i < 10; i++){...

smz

515

asked Sep 25, 2024 at 5:19

1 vote

0 answers

13 views

How to distinguish files with the same path in a container using perf buildid-list

I run perf record -a -- sleep 3 on the host machine for sampling. There may be binaries with the same path but different buildids in different containers. For example, container A has /tmp/a.out, and ...

zcfh

131

asked Sep 11, 2024 at 12:52

1 vote

1 answer

241 views

How do I prevent perf from writing to stdout when used in a pipe?

I am running two programs where one's stdout is piped into the other's stdin. I did this using named pipes in the following way (simplified for presentation): $ mkfifo feedback $ ./program1 < ...

Sebastián Mestre

132

asked Sep 9, 2024 at 11:54

0 votes

1 answer

129 views

How do I understand why a spawned thread takes that long to wake up?

I am comparing different implementations of a threadpool. One approach, containing locks and condition variables, is substantially (~100 microseconds) slower than another using a lock-free ...

fabian

1,881

asked Aug 17, 2024 at 15:30

Collectives™ on Stack Overflow

Profiling application using perf with clion docker toolchain

How does Linux Perf manage hardware counters when profiling a multithreaded process?

Perf probe fails because of missing symbol, although its listed under functions (clang)

L1d Cache miss and L1d cache ref counts are way off as shown by perf stat

Tracking Per Channel Memory Traffic in AMD Zen 2 (Rome)

Why does PERF_COUNT_HW_REF_CPU_CYCLES have much higher variance on Zen5 cpus than PERF_COUNT_HW_CPU_CYCLES?

How can I identify kernel-level bottlenecks behind high %system and %softirq CPU usage?

How does perf tool use eh_frame to unwind the call stack without frame pointer

Attaching to and receiving a process's own utrace events (dtrace USDT probe points)

Why perf complains that it cannot open this L1 cache event on Zen 2?

What it the difference between perf kvm record with perf record?

Why the perf flame graph on qemu+tcg are lots of empty?

Detect if my program is being profiled with perf in Linux

How to use perf inside docker container with CLion (pass --privileged flag to docker)?

perf stat for only the process I invoke with it

How to use perf to find the heavy read and write parts of a program

How to get dso infomations in linux perf call graph?

What causes kernel memory operations in perf stats for an userspace-only process?

perf stat output difference with and without -e option

How do I calculate the L3 cache miss rate and find number of trips to main memory using perf?

Tracking DRAM traffic in AMD Zen 2 (Rome)

Analyzing Cache Behavior and Memory Traffic in Large File Reads Using perf stat

How to distinguish files with the same path in a container using perf buildid-list

How do I prevent perf from writing to stdout when used in a pipe?

How do I understand why a spawned thread takes that long to wake up?

Hot Network Questions