289 questions
18
votes
2
answers
4k
views
Are loads and stores the only instructions that gets reordered?
I have read many articles on memory ordering, and all of them only say that a CPU reorders loads and stores.
Does a CPU (I'm specifically interested in an x86 CPU) only reorders loads and stores, and ...
30
votes
3
answers
5k
views
Memory barrier generators
Reading Joseph Albahari's threading tutorial, the following are mentioned as generators of memory barriers:
C#'s lock statement (Monitor.Enter/Monitor.Exit)
All methods on the Interlocked class
...
18
votes
2
answers
4k
views
C++ How is release-and-acquire achieved on x86 only using MOV?
This question is a follow-up/clarification to this:
Does the MOV x86 instruction implement a C++11 memory_order_release atomic store?
This states the MOV assembly instruction is sufficient to ...
39
votes
2
answers
26k
views
Atomicity of loads and stores on x86
8.1.2 Bus Locking
Intel 64 and IA-32 processors provide a LOCK# signal that is asserted
automatically during certain critical memory operations to lock the
system bus or equivalent link. While this ...
24
votes
4
answers
20k
views
When should I use _mm_sfence _mm_lfence and _mm_mfence
I read the "Intel Optimization guide Guide For Intel Architecture".
However, I still have no idea about when should I use
_mm_sfence()
_mm_lfence()
_mm_mfence()
Could anyone explain when these ...
20
votes
3
answers
5k
views
Why is (or isn't?) SFENCE + LFENCE equivalent to MFENCE?
As we know from a previous answer to Does it make any sense instruction LFENCE in processors x86/x86_64? that we can not use SFENCE instead of MFENCE for Sequential Consistency.
An answer there ...
5
votes
2
answers
4k
views
How does a mutex lock and unlock functions prevents CPU reordering?
As far as I know, a function call acts as a compiler barrier, but not as a CPU barrier.
This tutorial says the following:
acquiring a lock implies acquire semantics, while releasing a lock
...
58
votes
6
answers
25k
views
Why we need Thread.MemoryBarrier()?
In "C# 4 in a Nutshell", the author shows that this class can write 0 sometimes without MemoryBarrier, though I can't reproduce in my Core2Duo:
public class Foo
{
int _answer;
bool _complete;
...
31
votes
5
answers
11k
views
Which is a better write barrier on x86: lock+addl or xchgl?
The Linux kernel uses lock; addl $0,0(%%esp) as write barrier, while the RE2 library uses xchgl (%0),%0 as write barrier. What's the difference and which is better?
Does x86 also require read barrier ...
15
votes
1
answer
1k
views
If I don't use fences, how long could it take a core to see another core's writes?
I have been trying to Google my question but I honestly don't know how to succinctly state the question.
Suppose I have two threads in a multi-core Intel system. These threads are running on the ...
61
votes
2
answers
24k
views
Does it make any sense to use the LFENCE instruction on x86/x86_64 processors?
Often in internet I find that LFENCE makes no sense in processors x86, ie it does nothing , so instead MFENCE we can absolutely painless to use SFENCE, because MFENCE = SFENCE + LFENCE = SFENCE + NOP =...
73
votes
2
answers
32k
views
How do I Understand Read Memory Barriers and Volatile
Some languages provide a volatile modifier that is described as performing a "read memory barrier" prior to reading the memory that backs a variable.
A read memory barrier is commonly described as a ...
7
votes
1
answer
2k
views
Dependent loads reordering in CPU
I have been reading Memory Barriers: A Hardware View For Software Hackers, a very popular article by Paul E. McKenney.
One of the things the paper highlights is that, very weakly ordered processors ...
32
votes
4
answers
4k
views
Acquire/release semantics with 4 threads
I am currently reading C++ Concurrency in Action by Anthony Williams. One of his listing shows this code, and he states that the assertion that z != 0 can fire.
#include <atomic>
#include <...
156
votes
5
answers
70k
views
What is a memory fence?
What is meant by using an explicit memory fence?
29
votes
2
answers
13k
views
C++ Memory Barriers for Atomics
I'm a newbie when it comes to this. Could anyone provide a simplified explanation of the differences between the following memory barriers?
The windows MemoryBarrier();
The fence _mm_mfence();
The ...
81
votes
1
answer
43k
views
When are x86 LFENCE, SFENCE and MFENCE instructions required?
Ok, I have been reading the following Qs from SO regarding x86 CPU fences (LFENCE, SFENCE and MFENCE):
Does it make any sense instruction LFENCE in processors x86/x86_64?
What is the impact SFENCE and ...
26
votes
5
answers
16k
views
Fastest inline-assembly spinlock
I'm writing a multithreaded application in c++, where performance is critical. I need to use a lot of locking while copying small structures between threads, for this I have chosen to use spinlocks.
...
2
votes
1
answer
434
views
Analyzing of x86 output generated by JIT in the context of volatile
I am writting this post in connection to Deep understanding of volatile in Java
public class Main {
private int x;
private volatile int g;
public void actor1(){
x = 1;
g = 1;...
19
votes
2
answers
8k
views
Memory Barrier by lock statement
I read recently about memory barriers and the reordering issue and now I have some confusion about it.
Consider the following scenario:
private object _object1 = null;
private object _object2 = ...
8
votes
2
answers
5k
views
How do modern Intel x86 CPUs implement the total order over stores
x86 guarantees a total order over all stores due to its TSO memory model. My question is if anyone has an idea how this is actually implemented.
I have a good impression how all the 4 fences are ...
53
votes
7
answers
13k
views
Are mutex lock functions sufficient without volatile?
A coworker and I write software for a variety of platforms running on x86, x64, Itanium, PowerPC, and other 10 year old server CPUs.
We just had a discussion about whether mutex functions such as ...
30
votes
2
answers
7k
views
Memory model ordering and visibility?
I tried looking for details on this, I even read the standard on mutexes and atomics... but still I couldnt understand the C++11 memory model visibility guarantees.
From what I understand the very ...
13
votes
3
answers
3k
views
C# volatile variable: Memory fences VS. caching
So I researched the topic for quite some time now, and I think I understand the most important concepts like the release and acquire memory fences.
However, I haven't found a satisfactory explanation ...
12
votes
2
answers
544
views
GCC reordering up across load with `memory_order_seq_cst`. Is this allowed?
Using a simplified version of a basic seqlock , gcc reorders a nonatomic load up across an atomic load(memory_order_seq_cst) when compiling the code with -O3. This reordering isn't observed when ...