4,716 questions
Advice
1
vote
2
replies
127
views
How the Computer Handles Interrupts
What is the difference between an interrupt and a context switch?
I understand the concept of an interrupt and how it occurs. However, I'm digging deeper into the topic.
I studied Computer ...
0
votes
1
answer
62
views
Cache Allocation Technology in 13th Generation Core i9 13900E Intel CPU [closed]
I am trying to implement Cache allocation Technology`s impact with my CPU. However, when I use either lscpu to see whether my CPU supports, or cpuid -l 0x10, output is false.
How is this possible?
How ...
1
vote
1
answer
104
views
Is CPU multithreading effected by divergence?
Building on this question here
The term thread divergence is used in CUDA; from my understanding it's a situation where different threads are assigned to do different tasks and this results in a big ...
7
votes
1
answer
222
views
Why are all IMUL µOPs dispatched to Port 1 only (on Haswell), even when multiple IMULs are executed in parallel?
I'm experimenting with the IMUL r64, r64 instruction on an Intel Xeon E5-1620 v3 (Haswell architecture, base clock 3.5 GHz, turbo boost up to 3.6 GHz, Hyper Threading is enabled).
My test loop is ...
0
votes
1
answer
51
views
Fargate Cloudwatch CPU Utilisation differs from docker stats
Looking at the CPUUtilized Cloudwatch metric for my Fargate service, it's showing max cpu units used as 1040 over the past 4 weeks, using a sampling period of 1 minute. I have 4 vCPUs provisioned to ...
2
votes
0
answers
207
views
Why does floating point division take less than 50% of the latency of integer division and also 10x more latency than usual when underflow occurs?
I am measuring the latency of instructions.
For 64-bit primitives, integer division takes about 25 cycles each, usually on my 2.3GHz Digital Ocean vCPU, while floating point division takes about 10 ...
-3
votes
1
answer
108
views
Understanding when a hazard in MIPS occurs
I have a question regarding these two instructions:
lw r2, 10(r1)
lw r1, 10(r2)
Is there a hazard here, do I need stalls in between two of them?
I want to know if any kind of hazard happens here? I ...
1
vote
0
answers
84
views
popcnt instruction not as fast as loop on core ultra 155h [duplicate]
I think the title says it all: i have implemented a popcnt function that counts bits as a loop with shifts and one with inline asm with the actual cpu instruction.
This is my c code:
#define ...
1
vote
0
answers
77
views
How to analyze the microarchitecture resource requirements based on the trace generated by program execution?
I'm doing an in-depth CPU microarchitectural resource analysis. I want to know the requirements of my program on processor microarchitectural resources and compare the requirements of different ...
0
votes
0
answers
50
views
XGBoost GPU version not outperforming CPU on small dataset despite parameter tuning – suggestions needed
I'm currently working on a parallel and distributed computing project where I'm comparing the performance of XGBoost running on CPU vs GPU. The goal is to demonstrate how GPU acceleration can improve ...
0
votes
1
answer
166
views
Linux UIO IRQ related periodic CPU usage
I have an Intel Arria 10 SoC FPGA system with 5.4.104-lts Linux built with Yocto 3.3.1 and Poky.
The installed FPGA image is doing nothing more than making interrupts to an UIO device, 50 times a sec.
...
2
votes
1
answer
105
views
Why does VPERM2I128/_mm256_permute2x128_si256 (and also FP variants) not exist in AVX512 instruction set?
It could operate identically on both 256-bit halves of a 512-bit AVX512 register. Like identical operation on 128-bits lanes of 256-bits registers in AVX/AVX2. Any tech reasons?
0
votes
1
answer
98
views
Execution stages in a superscalar microarchitecture
In this article https://www.lighterra.com/papers/modernmicroprocessors it is stated that (under Multiple issue - Superscalar)
the fetch and decode/dispatch stages must be enhanced so they can decode ...
-4
votes
1
answer
142
views
How SIMD vs SIMT handle divergence [closed]
What exactly happens at the hardware level when a divergence occurs in SIMD and SIMT architectures, and how does each handle the execution of different instruction paths?
I found this question, but ...
1
vote
2
answers
119
views
Why does each DRAM chip have to contribute 8 bit to the 64 bit bus width parallely, instead of a single chip contribute all 64 bits
Okay my question is probably dumb. But I cant find any answers that correct me.
I learned that in DDR4 -lets say the stick has 8 chips- each chip parallelly contributes 8 bit to the 64 bit bus width.
...
0
votes
2
answers
249
views
How to wait until the CPU usage drops below 60% in VBA?
The following code is using for measuring CPU % usage.
Public Sub Macro1()
Dim strComputer As String
Dim objWMIService As Object
Dim colItems As Object
Dim objItem As Object
strComputer = ".&...
0
votes
1
answer
172
views
Get-Counter not working on certain servers to get average CPU Percent Utilization
This is my code:
(Get-Counter '\Processor(_Total)\% Processor Time').CounterSamples.CookedValue
I am trying to receive the average CPU Utilization with Get-Counter but every time i try i get this ...
0
votes
1
answer
101
views
Running test on Rocket core CPU - global variable initialized to 0 is unsuccessful, output wrong value instead
While I am benchmarking my Rocketcore CPU, I encountered failed Coremark benchmarking. After some debug, I reduce the issue scope to unsuccessful global initialization of 0 value. In Coremark, it will ...
1
vote
1
answer
79
views
Cache Effects in Statically Compiled Binaries: Unexpected Cache Misses
I have a simple Hello World program written in C, which I statically compiled using: gcc -static -fno-pie -o hello{1|2} hello.c.
I expected that executing these two binaries would exhibit cache ...
0
votes
0
answers
227
views
Created TensorFlow Lite XNNPACK delegate for CPU - ('--log-level=1') doesn't work
A simple Python script (Selenium + ChromeDriver):
# import the By class, which allows you to choose how to search for an element
from selenium.webdriver.common.by import By
# initialize the browser ...
0
votes
0
answers
107
views
SDL CPU rendering project, rendering error when resizing window: Window surface is invalid
I was working on a cpu only rendering project with SDL in C.
I implemented very good error handling and I got this error when I try to resize the window, "ERROR: SDL Error in render thread: ...
-1
votes
1
answer
89
views
Pod restart issue in java based micro-service architecture
There were 2 pods running in my micro-service, both of them got restarted with kubernetes reason as OOM killed
enter image description here
(The above dashboard uses the following query->sum(0,...
0
votes
1
answer
115
views
Why is my AI training on GPU is a lot slower than CPU
I'm currently training my simple prediction AI but my GPU is training at 40S per epochs while my CPU is training at 9S per epochs
my CPU is i7-4720HQ and my GPU is Nvidia 950m
this is my code
`import ...
0
votes
2
answers
91
views
platform-tools\adb.exe - High CPU usage on server (Windows)
Using ADB in a java application to monitor android device status every three seconds. Height adb commands are used :
adb shell settings get global airplane_mode_on
adb shell settings get system ...
2
votes
1
answer
127
views
Is there a way to get node level information in kubernetes pods?
I need low level information about the node, like number of cores, core ID and other things which is part of the kubelet in a pod running in the node. How do I get this?