Skip to main content
Filter by
Sorted by
Tagged with
1 vote
1 answer
52 views

When I run a Docker container on my system (ARM MacOS, with Docker Desktop), I see that /sys/devices/system/node is not present: docker run -it ubuntu:24.04 # ls /sys/devices/system clockevents ...
Daniel Porteous's user avatar
1 vote
0 answers
75 views

I test the EYPC 9564 CPU (dual socket), the core-to-core latency of the second socket is very high, even greater than the latency for inter-socket communication. As shown for AMD EPYC 7R13, 48 Cores, ...
wang fuqiang's user avatar
1 vote
2 answers
102 views

I am using a machine with 2 Xeon CPUs having 16 cores each. There are 2 NUMA domains, one for each CPU. I have intensive computation, that also use a lot of memory, and everything is multithreaded. ...
PierU's user avatar
  • 2,737
1 vote
0 answers
104 views

The numactl man page says: --membind=nodes, -m nodes Only allocate memory from nodes. Allocation will fail when there is not enough memory available on these nodes. nodes may be specified as noted ...
smwikipedia's user avatar
1 vote
0 answers
104 views

I'm doing a benchmark in Linux kernel. I want to make sure all the benchmarked kernel code is stored in the same NUMA node as the CPU that runs the code. I implement a system call to trigger the ...
sk_buff's user avatar
  • 101
1 vote
0 answers
101 views

gem5 how to build NUMA architecture? I know gem5 supports analog NUMA architecture. but I did not find the relevant information under the official library, I didn't find the configuration information ...
hhh1's user avatar
  • 11
0 votes
0 answers
38 views

Can I create, for example, two buffers on adjacent memory chips? On chips that are physically closer to the CPU. Or is it implied that ram physical addresses are always adjacent in space? You could ...
Lem's user avatar
  • 173
1 vote
0 answers
278 views

I want to calculate the theoretical UPI bandwidth of a dual-socket machine running Linux system in order to estimate the max remote memory access bandwidth. Theoretically, UPI bandwidth = UPI speed (...
Frontier_Setter's user avatar
1 vote
1 answer
269 views

I bind memory to run program on node 1. I insert some print code in the program to check current binded node. I found a function from numa.h: struct bitmask *numa_get_membind But I couldn't know how ...
김시은's user avatar
0 votes
1 answer
512 views

I'm trying to do NUMA aware memory allocation with hwloc and get somewhat strange behavior. My goal is to allocate blocks of memory on different NUMA nodes as i need this for a project. To verify that ...
Daniel's user avatar
  • 1
1 vote
1 answer
12k views

I'm using TensorFlow in my project, and every time I run my code, I get the following error message: 2023-02-23 13:17:55.003041: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967]...
nazim elhadi's user avatar
0 votes
2 answers
98 views

Some Win32 API function documentation (for example this and this) contains the following note: Starting with TBD Release Iron, the behavior of this and other NUMA functions has been modified to ...
Maris B.'s user avatar
  • 2,497
0 votes
1 answer
99 views

I've read about how NUMA works and that memory is pulled in from RAM through L2 and L1 caches. And that there are only two ways to share data: read access from n (n>=0) threads read-write access ...
office-account's user avatar
1 vote
1 answer
1k views

I'm running an application with multiple threads and it seems Linux is distributing threads among NUMA nodes almost equally. Say my application spawns 4 threads and my machine has 4 sockets. I observe ...
Mohammad Siavashi's user avatar
1 vote
2 answers
584 views

I have allocated an array in C as follows: void *mem = mmap(NULL, 8192, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0); Imagine this array is initialized and now I need to ...
Mohammad Siavashi's user avatar
0 votes
1 answer
1k views

I'm writing a NUMA-aware algorithm and need this information for optimal memory keeping. It would be nice if you know a solution for JVM(for example using oshi), but I can't find it even for C/C++
Dave11ar's user avatar
  • 420
1 vote
2 answers
230 views

I came across this behavior of speed up and I am finding it hard to explain. Following is the background: Program Invocation of Gaussian Elimination method to solve linear equation within a loop to ...
Sriram G's user avatar
3 votes
1 answer
1k views

This question is a spin-off of the one posted here: Measuring bandwidth on a ccNUMA system I've written a micro-benchmark for the memory bandwidth on a ccNUMA system with 2x Intel(R) Xeon(R) Platinum ...
Nitin Malapally's user avatar
0 votes
1 answer
400 views

I'm running a simple kernel which adds two streams of double-precision complex-values. I've parallelized it using OpenMP with custom scheduling: the slice_indices container contains different indices ...
Nitin Malapally's user avatar
1 vote
0 answers
100 views

With a 4 numa node linux server(128G each), I was trying to allocate 300G memory by kmalloc_node(2) to specify the allocation start node. Could any great master tell me what is the order of allocation ...
L.H's user avatar
  • 11
0 votes
1 answer
870 views

Does anyone know the exact meaning of "node size" for "numactl --hardware" output. I'm asking because I expected this memory value to be fixed but it changes slightly on some of ...
Farouk Khawaja's user avatar
2 votes
1 answer
277 views

I have this simple self-contained example of a very rudimentary 2 dimensional stencil application using OpenMP tasks on dynamic arrays to represent an issue that I am having on a problem that is less ...
user151387's user avatar
1 vote
0 answers
570 views

I'm trying to PInvoke UpdateProcThreadAttribute() with PROC_THREAD_ATTRIBUTE_PREFERRED_NODE attribute, so that I could launch a process on a specific NUMA node. I'm working on Windows Server 2019. I ...
Dan Sagher's user avatar
1 vote
0 answers
500 views

I have some multithreaded code where the threads spend a significant amount time in the page fault handler of the kernel (Linux 5.4). But this only happens on a two Socket NUMA machine, but not on on ...
benjamin-lieser's user avatar
0 votes
0 answers
232 views

Issue: I meet an issue of the 'mlock()' API. the first load is fast when lock the memory from '"/sys/devices/system/node/node0"', but it is too slow on node1, about take more than 3s to ...
Charles's user avatar

1
2 3 4 5
7