Skip to main content
Filter by
Sorted by
Tagged with
3 votes
0 answers
150 views

I was benchmarking a naive transposition and noticed a very large performance discrepancy in performance between: a naive operation where we read data contiguously and write with a large stride; the ...
Etienne M's user avatar
  • 715
1 vote
0 answers
32 views

I am using perf to profile workloads on my system, and I need to track the memory traffic generated by my workload on each NUMA node. Currently, I only have perf results for LLC cache misses, which ...
smz's user avatar
  • 515
1 vote
1 answer
114 views

I wanted to see if I am correctly interpreting the attached diagram. It shows the AMD Zen 3's cache lines. OC Fetch is Opcode Cache, IC Fetch is Instruction Cache. I am just unable to make sense of ...
Kush Jenamani's user avatar
2 votes
0 answers
68 views

I am trying to read cache events on a AMD Zen2: L1d all read accesses L1d all write accesses L1d read misses (not shown below) L1d write misses (not shown below) According to the perf_event_open(2) ...
onlycparra's user avatar
0 votes
0 answers
39 views

amd sdm implies CLGI can block vINTR Table 15-10 Effect of the GIF on Interrupt Handling 15.21.4 Injecting Virtual (INTR) Interrupts The processor takes a virtual INTR interrupt if: V_IRQ and ...
wang fuqiang's user avatar
2 votes
0 answers
69 views

According to AMD's material, access to contiguous physical addresses will be interleaved across all memory channels (if set to NPS1). When a machine has 8 memory channels and the size of memory ...
Frontier_Setter's user avatar
2 votes
1 answer
195 views

I want to track the number of read/write accesses at each of the Unified Memory Controllers (UMCs) in my AMD EPYC processor (family: 0x17 and model: 0x31). The AMDuProfPcm tool, when used with the -m ...
smz's user avatar
  • 515
0 votes
1 answer
48 views

I have bought the Kria KD240 Starter Kit to get used to working with drives applications and FOC control. I am following the steps mentioned here but I can't open the Vivado project correctly. When I ...
alagal's user avatar
  • 1
2 votes
1 answer
108 views

So I've been exploring the 12 chapter in the picoCTF primer and suddenly saw difference in my assembly of the program and the picoCTF's in the end of main function, where the stack canary is being ...
digitale's user avatar
2 votes
1 answer
169 views

struct StackFrame { DWORD64 address; std::string name; std::string module; std::string filename; int line_number; }; std::vector<StackFrame> GetStackTrace(CONTEXT context) { ...
Hari E's user avatar
  • 490
0 votes
0 answers
145 views

I'm writing a path tracer using HIPRT on Windows but I couldn't find anything to debug my application yet. I'd like to be able to execute my kernels line by line, watch kernel variables, print to ...
Tom Clabault's user avatar
1 vote
0 answers
284 views

I have two pc, one is Intel i7 13700KF with 64GB RAM and another is AMD 3970X with same RAM, both pc use ssd as storage and both pc has python 3.11 and polars 0.20.5. I run code below: df = pl....
Hakase's user avatar
  • 331
0 votes
0 answers
142 views

I want to learn how the "cache as ram" work, so i find some asm file in "/src/cpu/intel/car/" from coreboot. But there are four folders containing "cache_as_ram.S". What'...
50han Bill's user avatar
-1 votes
1 answer
327 views

I am using AMD's EPYC 7713 CPU. According to the specification, its maximum frequency is 3.675GHz. But when I run stress-ng (only running single threaded cpu loads), its frequency does not exceed 3....
Frontier_Setter's user avatar
0 votes
1 answer
351 views

I am getting this error: Illegal instruction (core dumped) When calling: cv::findHomography(query_points, reference_points, cv::RANSAC, homography_ransac_threshold_, h_mask); This happen only an AWS ...
Humam Helfawi's user avatar
2 votes
0 answers
451 views

In AMD's optimization manual, the L1 Data cache is described as follows: The L1 DC provides multiple access ports using a banked structure. The read ports are shared by three load pipes and victim ...
Frontier_Setter's user avatar
4 votes
1 answer
591 views

I have encountered the same problem as this. What does L2 poison mean? I'm using AMD CPU.
Frontier_Setter's user avatar
0 votes
0 answers
104 views

Currently using tensorflow-directml as I am training a model on AMD (RX 580). The problem is, upon model.fit() it seems to be stuck at epoch 1 with no progress. Here's my code and error: with ...
user21525821's user avatar
1 vote
0 answers
113 views

For monitoring memory bandwidth, there is pcm-memory on the Intel platform and AMDuProf on the AMD platform. How do they calculate memory bandwidth usage? Which PMUs were used? Is it using 1024 or ...
Frontier_Setter's user avatar
4 votes
0 answers
91 views

I have a loop that's running slower than I expected. I measure how long it takes per collection it processes and notice it takes twice as long when I use 8 cores (overall 4x faster). There's no data ...
David's user avatar
  • 41
0 votes
1 answer
61 views

I ran into the following problem: When I initialized the kernel hypervisor, for me it is SVM and I exit from vmrun and get into my SvmExitHandler (this is the dispatcher that manages exit codes), then ...
Barbosso's user avatar
2 votes
0 answers
418 views

I am trying to get familiar with AMD's interface of SMM. Want to implement simple task: Check SMI_COUNT Trigger SMI Check SMI_COUNT after trigger The SMI-interrupt is a rare thing (I believe), so ...
Rockrid3r's user avatar
  • 321
-1 votes
1 answer
285 views

I am working on a Software company, mainly developing on Linux. For Windows development we have couple of machines that are shared. However, a new project came up, and we need more resources on ...
wizard's user avatar
  • 155
2 votes
0 answers
76 views

On Intel the fixed-function performance counters can be read by setting bit 30 of ecx as well the index of the counter to read (0-4) in the bottom bits of that same register. Is something similar ...
BeeOnRope's user avatar
  • 66.3k
2 votes
0 answers
221 views

Memory channel interleaving is a method of setting a physical address area which can be enabled in BIOS, so that all memory channels are alternately used to achieve best bandwidth and latency. I want ...
Frontier_Setter's user avatar

1
2 3 4 5
11