Newest 'cpu' Questions

Advice

1 vote

2 replies

127 views

How the Computer Handles Interrupts

What is the difference between an interrupt and a context switch? I understand the concept of an interrupt and how it occurs. However, I'm digging deeper into the topic. I studied Computer ...

Gabriele

11

asked Nov 8 at 19:25

0 votes

1 answer

62 views

Cache Allocation Technology in 13th Generation Core i9 13900E Intel CPU [closed]

I am trying to implement Cache allocation Technology`s impact with my CPU. However, when I use either lscpu to see whether my CPU supports, or cpuid -l 0x10, output is false. How is this possible? How ...

Ali Hosseini

1

asked Oct 10 at 12:38

1 vote

1 answer

104 views

Is CPU multithreading effected by divergence?

Building on this question here The term thread divergence is used in CUDA; from my understanding it's a situation where different threads are assigned to do different tasks and this results in a big ...

bigcodeszzer

960

asked Sep 18 at 1:37

7 votes

1 answer

222 views

Why are all IMUL µOPs dispatched to Port 1 only (on Haswell), even when multiple IMULs are executed in parallel?

I'm experimenting with the IMUL r64, r64 instruction on an Intel Xeon E5-1620 v3 (Haswell architecture, base clock 3.5 GHz, turbo boost up to 3.6 GHz, Hyper Threading is enabled). My test loop is ...

Andrey Dmitriev

179

asked Sep 12 at 9:26

0 votes

1 answer

51 views

Fargate Cloudwatch CPU Utilisation differs from docker stats

Looking at the CPUUtilized Cloudwatch metric for my Fargate service, it's showing max cpu units used as 1040 over the past 4 weeks, using a sampling period of 1 minute. I have 4 vCPUs provisioned to ...

Seanf123

1

asked Sep 7 at 17:41

2 votes

0 answers

207 views

Why does floating point division take less than 50% of the latency of integer division and also 10x more latency than usual when underflow occurs?

I am measuring the latency of instructions. For 64-bit primitives, integer division takes about 25 cycles each, usually on my 2.3GHz Digital Ocean vCPU, while floating point division takes about 10 ...

Zack Light

362

asked Aug 22 at 5:35

-3 votes

1 answer

108 views

Understanding when a hazard in MIPS occurs

I have a question regarding these two instructions: lw r2, 10(r1) lw r1, 10(r2) Is there a hazard here, do I need stalls in between two of them? I want to know if any kind of hazard happens here? I ...

mer mer

17

asked Jun 28 at 15:34

1 vote

0 answers

84 views

popcnt instruction not as fast as loop on core ultra 155h [duplicate]

I think the title says it all: i have implemented a popcnt function that counts bits as a loop with shifts and one with inline asm with the actual cpu instruction. This is my c code: #define ...

newbee.a

10

asked Jun 17 at 10:25

1 vote

0 answers

77 views

How to analyze the microarchitecture resource requirements based on the trace generated by program execution?

I'm doing an in-depth CPU microarchitectural resource analysis. I want to know the requirements of my program on processor microarchitectural resources and compare the requirements of different ...

Gerrie

455

asked May 19 at 12:26

0 votes

0 answers

50 views

XGBoost GPU version not outperforming CPU on small dataset despite parameter tuning – suggestions needed

I'm currently working on a parallel and distributed computing project where I'm comparing the performance of XGBoost running on CPU vs GPU. The goal is to demonstrate how GPU acceleration can improve ...

Mxneeb

19

asked May 2 at 16:17

0 votes

1 answer

166 views

Linux UIO IRQ related periodic CPU usage

I have an Intel Arria 10 SoC FPGA system with 5.4.104-lts Linux built with Yocto 3.3.1 and Poky. The installed FPGA image is doing nothing more than making interrupts to an UIO device, 50 times a sec. ...

yepp

1

asked Apr 17 at 8:29

2 votes

1 answer

105 views

Why does VPERM2I128/_mm256_permute2x128_si256 (and also FP variants) not exist in AVX512 instruction set?

It could operate identically on both 256-bit halves of a 512-bit AVX512 register. Like identical operation on 128-bits lanes of 256-bits registers in AVX/AVX2. Any tech reasons?

Akon

481

asked Apr 13 at 5:03

0 votes

1 answer

98 views

Execution stages in a superscalar microarchitecture

In this article https://www.lighterra.com/papers/modernmicroprocessors it is stated that (under Multiple issue - Superscalar) the fetch and decode/dispatch stages must be enhanced so they can decode ...

Rishi

41

asked Mar 27 at 9:33

-4 votes

1 answer

142 views

How SIMD vs SIMT handle divergence [closed]

What exactly happens at the hardware level when a divergence occurs in SIMD and SIMT architectures, and how does each handle the execution of different instruction paths? I found this question, but ...

Rishi

41

asked Mar 24 at 4:29

1 vote

2 answers

119 views

Why does each DRAM chip have to contribute 8 bit to the 64 bit bus width parallely, instead of a single chip contribute all 64 bits

Okay my question is probably dumb. But I cant find any answers that correct me. I learned that in DDR4 -lets say the stick has 8 chips- each chip parallelly contributes 8 bit to the 64 bit bus width. ...

Rishi

41

asked Mar 21 at 4:18

0 votes

2 answers

249 views

How to wait until the CPU usage drops below 60% in VBA?

The following code is using for measuring CPU % usage. Public Sub Macro1() Dim strComputer As String Dim objWMIService As Object Dim colItems As Object Dim objItem As Object strComputer = ".&...

Kram Kramer

121

asked Mar 10 at 7:44

0 votes

1 answer

172 views

Get-Counter not working on certain servers to get average CPU Percent Utilization

This is my code: (Get-Counter '\Processor(_Total)\% Processor Time').CounterSamples.CookedValue I am trying to receive the average CPU Utilization with Get-Counter but every time i try i get this ...

mimi m

71

asked Feb 28 at 19:39

0 votes

1 answer

101 views

Running test on Rocket core CPU - global variable initialized to 0 is unsuccessful, output wrong value instead

While I am benchmarking my Rocketcore CPU, I encountered failed Coremark benchmarking. After some debug, I reduce the issue scope to unsuccessful global initialization of 0 value. In Coremark, it will ...

Jasminy

119

asked Feb 21 at 10:06

1 vote

1 answer

79 views

Cache Effects in Statically Compiled Binaries: Unexpected Cache Misses

I have a simple Hello World program written in C, which I statically compiled using: gcc -static -fno-pie -o hello{1|2} hello.c. I expected that executing these two binaries would exhibit cache ...

Khrn

354

asked Feb 5 at 7:43

0 votes

0 answers

227 views

Created TensorFlow Lite XNNPACK delegate for CPU - ('--log-level=1') doesn't work

A simple Python script (Selenium + ChromeDriver): # import the By class, which allows you to choose how to search for an element from selenium.webdriver.common.by import By # initialize the browser ...

Sergey Saz

1

asked Jan 29 at 14:02

0 votes

0 answers

107 views

SDL CPU rendering project, rendering error when resizing window: Window surface is invalid

I was working on a cpu only rendering project with SDL in C. I implemented very good error handling and I got this error when I try to resize the window, "ERROR: SDL Error in render thread: ...

Tejas Patil

11

asked Jan 20 at 12:25

-1 votes

1 answer

89 views

Pod restart issue in java based micro-service architecture

There were 2 pods running in my micro-service, both of them got restarted with kubernetes reason as OOM killed enter image description here (The above dashboard uses the following query->sum(0,...

Yash Arora

1

asked Jan 18 at 16:21

0 votes

1 answer

115 views

Why is my AI training on GPU is a lot slower than CPU

I'm currently training my simple prediction AI but my GPU is training at 40S per epochs while my CPU is training at 9S per epochs my CPU is i7-4720HQ and my GPU is Nvidia 950m this is my code `import ...

Vio Octavio

1

asked Jan 16 at 15:11

0 votes

2 answers

91 views

platform-tools\adb.exe - High CPU usage on server (Windows)

Using ADB in a java application to monitor android device status every three seconds. Height adb commands are used : adb shell settings get global airplane_mode_on adb shell settings get system ...

rejdrouin

101

asked Jan 1 at 21:41

2 votes

1 answer

127 views

Is there a way to get node level information in kubernetes pods?

I need low level information about the node, like number of cores, core ID and other things which is part of the kubelet in a pod running in the node. How do I get this?

imawful

135

asked Jan 1 at 15:19

Collectives™ on Stack Overflow

How the Computer Handles Interrupts

Cache Allocation Technology in 13th Generation Core i9 13900E Intel CPU [closed]

Is CPU multithreading effected by divergence?

Why are all IMUL µOPs dispatched to Port 1 only (on Haswell), even when multiple IMULs are executed in parallel?

Fargate Cloudwatch CPU Utilisation differs from docker stats

Why does floating point division take less than 50% of the latency of integer division and also 10x more latency than usual when underflow occurs?

Understanding when a hazard in MIPS occurs

popcnt instruction not as fast as loop on core ultra 155h [duplicate]

How to analyze the microarchitecture resource requirements based on the trace generated by program execution?

XGBoost GPU version not outperforming CPU on small dataset despite parameter tuning – suggestions needed

Linux UIO IRQ related periodic CPU usage

Why does VPERM2I128/_mm256_permute2x128_si256 (and also FP variants) not exist in AVX512 instruction set?

Execution stages in a superscalar microarchitecture

How SIMD vs SIMT handle divergence [closed]

Why does each DRAM chip have to contribute 8 bit to the 64 bit bus width parallely, instead of a single chip contribute all 64 bits

How to wait until the CPU usage drops below 60% in VBA?

Get-Counter not working on certain servers to get average CPU Percent Utilization

Running test on Rocket core CPU - global variable initialized to 0 is unsuccessful, output wrong value instead

Cache Effects in Statically Compiled Binaries: Unexpected Cache Misses

Created TensorFlow Lite XNNPACK delegate for CPU - ('--log-level=1') doesn't work

SDL CPU rendering project, rendering error when resizing window: Window surface is invalid

Pod restart issue in java based micro-service architecture

Why is my AI training on GPU is a lot slower than CPU

platform-tools\adb.exe - High CPU usage on server (Windows)

Is there a way to get node level information in kubernetes pods?

Hot Network Questions