Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
30 views

Clion offers a perf integration for profiling applications. Getting this to work when using a docker toolchain is not described online. Is it possible to do this, and how does one manage the different ...
Jack's user avatar
  • 61
2 votes
1 answer
75 views

I am trying to understand how Linux Perf manages hardware counters when profiling a multithreaded process. According to the documentation, the perf_event_open syscall can be used in two ways: To ...
LorienLV's user avatar
0 votes
0 answers
25 views

I'm trying to instrument clang with perf to mesure how much a certain llvm pass runs for (since -ftime-report seems to give inconsistent metrics and can't be used on multi file compilations). I have ...
gillo04's user avatar
  • 148
2 votes
0 answers
120 views

Wrote an eBPF code to count cache refs and miss of a target process. The program seems to work, albeit the counts dont match even closely to the perf stat output. I am assuming there is some issue ...
ultimate cause's user avatar
1 vote
0 answers
32 views

I am using perf to profile workloads on my system, and I need to track the memory traffic generated by my workload on each NUMA node. Currently, I only have perf results for LLC cache misses, which ...
smz's user avatar
  • 515
0 votes
1 answer
113 views

My understanding is that PERF_COUNT_HW_REF_CPU_CYCLES should map to some counter that counts at a constant rate, as opposed to PERF_COUNT_HW_CPU_CYCLES which is affected by frequency scaling. I'd ...
Joseph Garvin's user avatar
2 votes
0 answers
113 views

I’m running a virtual machine on the host using QEMU/KVM, and I execute a specific workload inside the guest (e.g., TCP send). When monitoring the CPU usage on the host with mpstat, I observe that ...
choiyhking's user avatar
1 vote
0 answers
37 views

I'm learning how to print the call stack in a program without a frame pointer. Currently, I know that we can use DWARF's eh_frame section in ELF files to perform stack unwinding, and I've successfully ...
Nail Jay's user avatar
  • 285
0 votes
0 answers
47 views

Is it possible for a process to probe dtrace/perf/etc USDTs for its own process (or even better, process group or uid) without needing elevated privileges or being able to trace processes with other ...
Craig Ringer's user avatar
2 votes
0 answers
68 views

I am trying to read cache events on a AMD Zen2: L1d all read accesses L1d all write accesses L1d read misses (not shown below) L1d write misses (not shown below) According to the perf_event_open(2) ...
onlycparra's user avatar
0 votes
1 answer
92 views

Recently, I am profiling a workload with multiple kvm-based virtual machines (VMs). In short, I find that perf kvm record + perf kvm report shows different results with perf record + perf report. ...
huangjl's user avatar
  • 85
0 votes
1 answer
118 views

all I use the perf to trace my qemu+tcg performance issue(attaching perf to tcg thread). I found that there is lots of empty in my flame-graph. So, why there is a lot of empty (the red circle in the ...
Jing's user avatar
  • 13
1 vote
1 answer
149 views

I have a program that needs to emit some JIT symbols for perf. I would like it to only emit them when being profiled. As such, I'd like a way to detect that my application is being profiled under perf ...
Offtkp's user avatar
  • 470
-2 votes
1 answer
106 views

i would like to profile my c++ app inside docker container. Perf needs container launched with --privileged flag but CLion doesn't have that option for docker plugin. Are there any ways to use CLion ...
dexnp1's user avatar
  • 1
0 votes
0 answers
179 views

On Ubuntu, using: sudo perf stat -e offcore_requests.demand_code_rd,offcore_requests.all_requests,l2_rqsts.all_demand_data_rd,mem_inst_retired.all_loads myApp I'd like perf to return statistics only ...
jkang's user avatar
  • 559
0 votes
0 answers
39 views

I want to use perf to find out how many instructions will cause a large number of cache or memory accesses. Is there any good sampling method or usage method? My current sampling is perf record -e ...
HhhHa's user avatar
  • 1
2 votes
0 answers
58 views

I want to find in perf's call graph which function of a certain library function is called by the corresponding source code function of the binary file, and get this data for parsing, but the call ...
xiaoyu he's user avatar
4 votes
1 answer
160 views

I'm running a simple program where: A thread pinned to CPU 1 performs random reads from a pre-allocated and initialized 2GB memory region and no system calls are made during the memory access loop. ...
idle_cycles's user avatar
1 vote
0 answers
38 views

root@8ccbd6ec81f8:\~/my-bcc-image/cpusnoop# perf stat -e instructions ./a.out Performance counter stats for './a.out': 48100736 instructions ...
BRATIN MONDAL's user avatar
0 votes
0 answers
166 views

I'm trying to measure the L3 cache miss rate using the following formula: I found that LLC misses can be obtained using this perf command from How to catch the L3-cache hits and misses by perf tool ...
Sherlock's user avatar
2 votes
1 answer
195 views

I want to track the number of read/write accesses at each of the Unified Memory Controllers (UMCs) in my AMD EPYC processor (family: 0x17 and model: 0x31). The AMDuProfPcm tool, when used with the -m ...
smz's user avatar
  • 515
3 votes
1 answer
180 views

Consider the following code: int main(int argc, char** argv) { int buf_size = 1024*1024*1024; char* buffer = malloc(buf_size); char* buffer2 = malloc(buf_size); for (int i = 0; i < 10; i++){...
smz's user avatar
  • 515
1 vote
0 answers
13 views

I run perf record -a -- sleep 3 on the host machine for sampling. There may be binaries with the same path but different buildids in different containers. For example, container A has /tmp/a.out, and ...
zcfh's user avatar
  • 131
1 vote
1 answer
241 views

I am running two programs where one's stdout is piped into the other's stdin. I did this using named pipes in the following way (simplified for presentation): $ mkfifo feedback $ ./program1 < ...
Sebastián Mestre's user avatar
0 votes
1 answer
129 views

I am comparing different implementations of a threadpool. One approach, containing locks and condition variables, is substantially (~100 microseconds) slower than another using a lock-free ...
fabian's user avatar
  • 1,881

1
2 3 4 5
23