314 questions
1
vote
1
answer
52
views
How to enable NUMA nodes in Docker container
When I run a Docker container on my system (ARM MacOS, with Docker Desktop), I see that /sys/devices/system/node is not present:
docker run -it ubuntu:24.04
# ls /sys/devices/system
clockevents ...
1
vote
0
answers
75
views
Why is the core-to-core-latency performance of EPYC 4 so poor in NUMA2 mode?
I test the EYPC 9564 CPU (dual socket), the core-to-core latency of the second socket is very high, even greater than the latency for inter-socket communication. As shown for AMD EPYC 7R13, 48 Cores, ...
1
vote
2
answers
102
views
Is it possible to somehow mix static and dynamic loop scheduling?
I am using a machine with 2 Xeon CPUs having 16 cores each. There are 2 NUMA domains, one for each CPU.
I have intensive computation, that also use a lot of memory, and everything is multithreaded. ...
1
vote
0
answers
104
views
numactl: Is it possible to use cpu and memory from different numa nodes?
The numactl man page says:
--membind=nodes, -m nodes Only allocate memory from nodes. Allocation will fail when there is not enough memory available on these nodes.
nodes may be specified as noted ...
1
vote
0
answers
104
views
Is it possible to load Linux kernel code to a specific NUMA node when booting?
I'm doing a benchmark in Linux kernel. I want to make sure all the benchmarked kernel code is stored in the same NUMA node as the CPU that runs the code. I implement a system call to trigger the ...
1
vote
0
answers
101
views
how to simulate NUMA in gem5?
gem5 how to build NUMA architecture? I know gem5 supports analog NUMA architecture. but I did not find the relevant information under the official library, I didn't find the configuration information ...
0
votes
0
answers
38
views
Is there a NUMA-like mechanism for a DRAM?
Can I create, for example, two buffers on adjacent memory chips? On chips that are physically closer to the CPU. Or is it implied that ram physical addresses are always adjacent in space? You could ...
1
vote
0
answers
278
views
How to calculate the theoretical max UPI bandwidth of a Linux dual-socket machine?
I want to calculate the theoretical UPI bandwidth of a dual-socket machine running Linux system in order to estimate the max remote memory access bandwidth.
Theoretically, UPI bandwidth = UPI speed (...
1
vote
1
answer
269
views
Return value of struct bitmask *numa_get_membind
I bind memory to run program on node 1.
I insert some print code in the program to check current binded node.
I found a function from numa.h:
struct bitmask *numa_get_membind
But I couldn't know how ...
0
votes
1
answer
512
views
NUMA memory allocation with hwloc
I'm trying to do NUMA aware memory allocation with hwloc and get somewhat strange behavior.
My goal is to allocate blocks of memory on different NUMA nodes as i need this for a project. To verify that ...
1
vote
1
answer
12k
views
Error message in TensorFlow: "could not open file to read NUMA node" and missing directory in /sys/bus/pci/devices
I'm using TensorFlow in my project, and every time I run my code, I get the following error message:
2023-02-23 13:17:55.003041: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967]...
0
votes
2
answers
98
views
What is the "TBD Release Iron" and what are the modifications?
Some Win32 API function documentation (for example this and this) contains the following note:
Starting with TBD Release Iron, the behavior of this and other NUMA
functions has been modified to ...
0
votes
1
answer
99
views
How granular can multithreaded memory-write access be?
I've read about how NUMA works and that memory is pulled in from RAM through L2 and L1 caches.
And that there are only two ways to share data:
read access from n (n>=0) threads
read-write access ...
1
vote
1
answer
1k
views
Why Linux distributes threads among NUMA nodes almost equally?
I'm running an application with multiple threads and it seems Linux is distributing threads among NUMA nodes almost equally. Say my application spawns 4 threads and my machine has 4 sockets. I observe ...
1
vote
2
answers
584
views
How to migrate array to a new NUMA node in C?
I have allocated an array in C as follows:
void *mem = mmap(NULL, 8192, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
Imagine this array is initialized and now I need to ...
0
votes
1
answer
1k
views
Is it possible to find out which NUMA system memory bank the current thread belongs to?
I'm writing a NUMA-aware algorithm and need this information for optimal memory keeping. It would be nice if you know a solution for JVM(for example using oshi), but I can't find it even for C/C++
1
vote
2
answers
230
views
Understanding the speed up of openmp program across NUMA nodes
I came across this behavior of speed up and I am finding it hard to explain. Following is the background:
Program
Invocation of Gaussian Elimination method to solve linear equation within a loop to ...
3
votes
1
answer
1k
views
Explanation for why effective DRAM bandwidth reduces upon adding CPUs
This question is a spin-off of the one posted here: Measuring bandwidth on a ccNUMA system
I've written a micro-benchmark for the memory bandwidth on a ccNUMA system with 2x Intel(R) Xeon(R) Platinum ...
0
votes
1
answer
400
views
How to test the problem size scaling performance of code
I'm running a simple kernel which adds two streams of double-precision complex-values. I've parallelized it using OpenMP with custom scheduling: the slice_indices container contains different indices ...
1
vote
0
answers
100
views
What is the order of memory allocation when demand exceeds single numa node
With a 4 numa node linux server(128G each), I was trying to allocate 300G memory by kmalloc_node(2) to specify the allocation start node. Could any great master tell me what is the order of allocation ...
0
votes
1
answer
870
views
What is the meaning of size for the numactl --hardware output
Does anyone know the exact meaning of "node size" for "numactl --hardware" output. I'm asking because I expected this memory value to be fixed but it changes slightly on some of ...
2
votes
1
answer
277
views
How can I realize data local spawning or scheduling of tasks in OpenMP on NUMA CPUs?
I have this simple self-contained example of a very rudimentary 2 dimensional stencil application using OpenMP tasks on dynamic arrays to represent an issue that I am having on a problem that is less ...
1
vote
0
answers
570
views
How to PInvoke UpdateProcThreadAttribute with PROC_THREAD_ATTRIBUTE_PREFERRED_NODE attribute
I'm trying to PInvoke UpdateProcThreadAttribute() with PROC_THREAD_ATTRIBUTE_PREFERRED_NODE attribute, so that I could launch a process on a specific NUMA node. I'm working on Windows Server 2019.
I ...
1
vote
0
answers
500
views
Can page faults be triggered by NUMA access?
I have some multithreaded code where the threads spend a significant amount time in the page fault handler of the kernel (Linux 5.4).
But this only happens on a two Socket NUMA machine, but not on on ...
0
votes
0
answers
232
views
Is there a mlock issue when allocate 1G hugepages return so slow?
Issue:
I meet an issue of the 'mlock()' API. the first load is fast when lock the memory from '"/sys/devices/system/node/node0"', but it is too slow on node1, about take more than 3s to ...