541 questions
3
votes
0
answers
150
views
The cost of non contiguous reads and writes (naive matrix transpose, power-of-2 and other sizes)
I was benchmarking a naive transposition and noticed a very large performance discrepancy in performance between:
a naive operation where we read data contiguously and write with a large stride;
the ...
1
vote
0
answers
32
views
Tracking Per Channel Memory Traffic in AMD Zen 2 (Rome)
I am using perf to profile workloads on my system, and I need to track the memory traffic generated by my workload on each NUMA node. Currently, I only have perf results for LLC cache misses, which ...
1
vote
1
answer
114
views
Cache line sizes for AMD Zen 3 Architecture
I wanted to see if I am correctly interpreting the attached diagram.
It shows the AMD Zen 3's cache lines.
OC Fetch is Opcode Cache,
IC Fetch is Instruction Cache.
I am just unable to make sense of ...
2
votes
0
answers
68
views
Why perf complains that it cannot open this L1 cache event on Zen 2?
I am trying to read cache events on a AMD Zen2:
L1d all read accesses
L1d all write accesses
L1d read misses (not shown below)
L1d write misses (not shown below)
According to the perf_event_open(2) ...
0
votes
0
answers
39
views
If CLGI can block virtual interrupt or not?
amd sdm implies CLGI can block vINTR
Table 15-10 Effect of the GIF on Interrupt Handling
15.21.4 Injecting Virtual (INTR) Interrupts
The processor takes a virtual INTR interrupt if:
V_IRQ and ...
2
votes
0
answers
69
views
How to verify the granularity of memory access interleaving across different channels?
According to AMD's material, access to contiguous physical addresses will be interleaved across all memory channels (if set to NPS1). When a machine has 8 memory channels and the size of memory ...
2
votes
1
answer
195
views
Tracking DRAM traffic in AMD Zen 2 (Rome)
I want to track the number of read/write accesses at each of the Unified Memory Controllers (UMCs) in my AMD EPYC processor (family: 0x17 and model: 0x31). The AMDuProfPcm tool, when used with the -m ...
0
votes
1
answer
48
views
Problems opening FOC motor control app in Vivado 2023.2
I have bought the Kria KD240 Starter Kit to get used to working with drives applications and FOC control. I am following the steps mentioned here but I can't open the Vivado project correctly. When I ...
2
votes
1
answer
108
views
Why does AMD processor use sub instruction instead of xor to verify the stack canary?
So I've been exploring the 12 chapter in the picoCTF primer and suddenly saw difference in my assembly of the program and the picoCTF's in the end of main function, where the stack canary is being ...
2
votes
1
answer
169
views
SymFromAddr fails on AMD Machine with the error message "Attempt to access Invalid address"
struct StackFrame
{
DWORD64 address;
std::string name;
std::string module;
std::string filename;
int line_number;
};
std::vector<StackFrame> GetStackTrace(CONTEXT context)
{
...
0
votes
0
answers
145
views
How to debug an HIP/HIPRT application on windows?
I'm writing a path tracer using HIPRT on Windows but I couldn't find anything to debug my application yet. I'd like to be able to execute my kernels line by line, watch kernel variables, print to ...
1
vote
0
answers
284
views
Why polars on intel cpu is faster than on amd cpu?
I have two pc, one is Intel i7 13700KF with 64GB RAM and another is AMD 3970X with same RAM, both pc use ssd as storage and both pc has python 3.11 and polars 0.20.5. I run code below:
df = pl....
0
votes
0
answers
142
views
What's the difference between those "cache_as_ram.S" in coreboot?
I want to learn how the "cache as ram" work, so i find some asm file in "/src/cpu/intel/car/" from coreboot. But there are four folders containing "cache_as_ram.S". What'...
-1
votes
1
answer
327
views
Why is the frequency of the CPU lower than the Max. Boost Clock?
I am using AMD's EPYC 7713 CPU. According to the specification, its maximum frequency is 3.675GHz. But when I run stress-ng (only running single threaded cpu loads), its frequency does not exceed 3....
0
votes
1
answer
351
views
Illegal instruction (core dumped) in cv::findHomography
I am getting this error:
Illegal instruction (core dumped)
When calling:
cv::findHomography(query_points, reference_points, cv::RANSAC, homography_ransac_threshold_, h_mask);
This happen only an AWS ...
2
votes
0
answers
451
views
What does the cache bank mean in AMD CPU?
In AMD's optimization manual, the L1 Data cache is described as follows:
The L1 DC provides multiple access ports using a banked structure. The read ports are shared by three load pipes and victim ...
4
votes
1
answer
591
views
What does L2 poison mean in CPU?
I have encountered the same problem as this.
What does L2 poison mean?
I'm using AMD CPU.
0
votes
0
answers
104
views
model.fit() stopping halfway on 1 epoch using tensorflow-directml. What to do?
Currently using tensorflow-directml as I am training a model on AMD (RX 580). The problem is, upon model.fit() it seems to be stuck at epoch 1 with no progress. Here's my code and error:
with ...
1
vote
0
answers
113
views
How do different monitoring tools calculate memory bandwidth?
For monitoring memory bandwidth, there is pcm-memory on the Intel platform and AMDuProf on the AMD platform.
How do they calculate memory bandwidth usage? Which PMUs were used?
Is it using 1024 or ...
4
votes
0
answers
91
views
Use perf to see if I'm write bound?
I have a loop that's running slower than I expected. I measure how long it takes per collection it processes and notice it takes twice as long when I use 8 cores (overall 4x faster). There's no data ...
0
votes
1
answer
61
views
How can I use kernel functions in SVM root(execute) mode?
I ran into the following problem: When I initialized the kernel hypervisor, for me it is SVM and I exit from vmrun and get into my SvmExitHandler (this is the dispatcher that manages exit codes), then ...
2
votes
0
answers
418
views
Obtaining SMI_COUNT on amd cpu
I am trying to get familiar with AMD's interface of SMM. Want to implement simple task:
Check SMI_COUNT
Trigger SMI
Check SMI_COUNT after trigger
The SMI-interrupt is a rare thing (I believe), so ...
-1
votes
1
answer
285
views
Windows 10 nested virtualization on AMD CPU
I am working on a Software company, mainly developing on Linux. For Windows development we have couple of machines that are shared. However, a new project came up, and we need more resources on ...
2
votes
0
answers
76
views
Can rdpmc be used to read the fixed-function counters on AMD?
On Intel the fixed-function performance counters can be read by setting bit 30 of ecx as well the index of the counter to read (0-4) in the bottom bits of that same register.
Is something similar ...
2
votes
0
answers
221
views
What granularity does memory channel interleaving occur when enabled in BIOS?
Memory channel interleaving is a method of setting a physical address area which can be enabled in BIOS, so that all memory channels are alternately used to achieve best bandwidth and latency.
I want ...