Frequent 'memory-barriers' Questions

18 votes

2 answers

4k views

Are loads and stores the only instructions that gets reordered?

I have read many articles on memory ordering, and all of them only say that a CPU reorders loads and stores. Does a CPU (I'm specifically interested in an x86 CPU) only reorders loads and stores, and ...

James

803

asked May 23, 2018 at 17:57

30 votes

3 answers

5k views

Memory barrier generators

Reading Joseph Albahari's threading tutorial, the following are mentioned as generators of memory barriers: C#'s lock statement (Monitor.Enter/Monitor.Exit) All methods on the Interlocked class ...

Ohad Schneider

38.5k

asked Jul 5, 2011 at 11:22

18 votes

2 answers

4k views

C++ How is release-and-acquire achieved on x86 only using MOV?

This question is a follow-up/clarification to this: Does the MOV x86 instruction implement a C++11 memory_order_release atomic store? This states the MOV assembly instruction is sufficient to ...

user997112

31.1k

asked Feb 20, 2020 at 6:40

39 votes

2 answers

26k views

Atomicity of loads and stores on x86

8.1.2 Bus Locking Intel 64 and IA-32 processors provide a LOCK# signal that is asserted automatically during certain critical memory operations to lock the system bus or equivalent link. While this ...

Gilgamesz

5,173

asked Jul 18, 2016 at 23:02

24 votes

4 answers

20k views

When should I use _mm_sfence _mm_lfence and _mm_mfence

I read the "Intel Optimization guide Guide For Intel Architecture". However, I still have no idea about when should I use _mm_sfence() _mm_lfence() _mm_mfence() Could anyone explain when these ...

prgbenz

1,199

asked Dec 27, 2010 at 9:35

20 votes

3 answers

5k views

Why is (or isn't?) SFENCE + LFENCE equivalent to MFENCE?

As we know from a previous answer to Does it make any sense instruction LFENCE in processors x86/x86_64? that we can not use SFENCE instead of MFENCE for Sequential Consistency. An answer there ...

Alex

13.3k

asked Dec 23, 2014 at 21:04

5 votes

2 answers

4k views

How does a mutex lock and unlock functions prevents CPU reordering?

As far as I know, a function call acts as a compiler barrier, but not as a CPU barrier. This tutorial says the following: acquiring a lock implies acquire semantics, while releasing a lock ...

user8426277

657

asked Jun 20, 2018 at 14:47

58 votes

6 answers

25k views

Why we need Thread.MemoryBarrier()?

In "C# 4 in a Nutshell", the author shows that this class can write 0 sometimes without MemoryBarrier, though I can't reproduce in my Core2Duo: public class Foo { int _answer; bool _complete; ...

Felipe Pessoto

7,000

asked Aug 24, 2010 at 12:24

31 votes

5 answers

11k views

Which is a better write barrier on x86: lock+addl or xchgl?

The Linux kernel uses lock; addl $0,0(%%esp) as write barrier, while the RE2 library uses xchgl (%0),%0 as write barrier. What's the difference and which is better? Does x86 also require read barrier ...

Hongli

19k

asked Nov 20, 2010 at 12:15

15 votes

1 answer

1k views

If I don't use fences, how long could it take a core to see another core's writes?

I have been trying to Google my question but I honestly don't know how to succinctly state the question. Suppose I have two threads in a multi-core Intel system. These threads are running on the ...

Cube Fan

153

asked Jul 11, 2018 at 19:12

61 votes

2 answers

24k views

Does it make any sense to use the LFENCE instruction on x86/x86_64 processors?

Often in internet I find that LFENCE makes no sense in processors x86, ie it does nothing , so instead MFENCE we can absolutely painless to use SFENCE, because MFENCE = SFENCE + LFENCE = SFENCE + NOP =...

Alex

13.3k

asked Dec 1, 2013 at 19:19

73 votes

2 answers

32k views

How do I Understand Read Memory Barriers and Volatile

Some languages provide a volatile modifier that is described as performing a "read memory barrier" prior to reading the memory that backs a variable. A read memory barrier is commonly described as a ...

Jason Kresowaty

16.6k

asked Nov 24, 2009 at 2:39

7 votes

1 answer

2k views

Dependent loads reordering in CPU

I have been reading Memory Barriers: A Hardware View For Software Hackers, a very popular article by Paul E. McKenney. One of the things the paper highlights is that, very weakly ordered processors ...

KodeWarrior

3,618

asked Jan 31, 2016 at 15:35

32 votes

4 answers

4k views

Acquire/release semantics with 4 threads

I am currently reading C++ Concurrency in Action by Anthony Williams. One of his listing shows this code, and he states that the assertion that z != 0 can fire. #include <atomic> #include <...

Aryan

638

asked Jan 22, 2018 at 14:31

156 votes

5 answers

70k views

What is a memory fence?

What is meant by using an explicit memory fence?

yesraaj

48.3k

asked Nov 13, 2008 at 9:30

29 votes

2 answers

13k views

C++ Memory Barriers for Atomics

I'm a newbie when it comes to this. Could anyone provide a simplified explanation of the differences between the following memory barriers? The windows MemoryBarrier(); The fence _mm_mfence(); The ...

AJG85

16.3k

asked Jan 12, 2012 at 20:22

81 votes

1 answer

43k views

When are x86 LFENCE, SFENCE and MFENCE instructions required?

Ok, I have been reading the following Qs from SO regarding x86 CPU fences (LFENCE, SFENCE and MFENCE): Does it make any sense instruction LFENCE in processors x86/x86_64? What is the impact SFENCE and ...

user997112

31.1k

asked Dec 22, 2014 at 1:40

26 votes

5 answers

16k views

Fastest inline-assembly spinlock

I'm writing a multithreaded application in c++, where performance is critical. I need to use a lot of locking while copying small structures between threads, for this I have chosen to use spinlocks. ...

sigvardsen

1,541

asked Aug 14, 2012 at 19:26

2 votes

1 answer

434 views

Analyzing of x86 output generated by JIT in the context of volatile

I am writting this post in connection to Deep understanding of volatile in Java public class Main { private int x; private volatile int g; public void actor1(){ x = 1; g = 1;...

Gilgamesz

5,173

asked Jul 17, 2017 at 19:02

19 votes

2 answers

8k views

Memory Barrier by lock statement

I read recently about memory barriers and the reordering issue and now I have some confusion about it. Consider the following scenario: private object _object1 = null; private object _object2 = ...

Jalal Said

16.2k

asked May 16, 2010 at 16:16

8 votes

2 answers

5k views

How do modern Intel x86 CPUs implement the total order over stores

x86 guarantees a total order over all stores due to its TSO memory model. My question is if anyone has an idea how this is actually implemented. I have a good impression how all the 4 fences are ...

pveentjer

11.6k

asked Jun 19, 2020 at 7:31

53 votes

7 answers

13k views

Are mutex lock functions sufficient without volatile?

A coworker and I write software for a variety of platforms running on x86, x64, Itanium, PowerPC, and other 10 year old server CPUs. We just had a discussion about whether mutex functions such as ...

David

1,033

asked Jul 26, 2011 at 23:10

30 votes

2 answers

7k views

Memory model ordering and visibility?

I tried looking for details on this, I even read the standard on mutexes and atomics... but still I couldnt understand the C++11 memory model visibility guarantees. From what I understand the very ...

NoSenseEtAl

30.9k

asked Sep 18, 2011 at 12:26

13 votes

3 answers

3k views

C# volatile variable: Memory fences VS. caching

So I researched the topic for quite some time now, and I think I understand the most important concepts like the release and acquire memory fences. However, I haven't found a satisfactory explanation ...

domin

1,384

asked Jun 22, 2017 at 7:25

12 votes

2 answers

544 views

GCC reordering up across load with `memory_order_seq_cst`. Is this allowed?

Using a simplified version of a basic seqlock , gcc reorders a nonatomic load up across an atomic load(memory_order_seq_cst) when compiling the code with -O3. This reordering isn't observed when ...

Alejandro

3,082

asked Apr 30, 2016 at 18:03

Collectives™ on Stack Overflow

Are loads and stores the only instructions that gets reordered?

Memory barrier generators

C++ How is release-and-acquire achieved on x86 only using MOV?

Atomicity of loads and stores on x86

When should I use _mm_sfence _mm_lfence and _mm_mfence

Why is (or isn't?) SFENCE + LFENCE equivalent to MFENCE?

How does a mutex lock and unlock functions prevents CPU reordering?

Why we need Thread.MemoryBarrier()?

Which is a better write barrier on x86: lock+addl or xchgl?

If I don't use fences, how long could it take a core to see another core's writes?

Does it make any sense to use the LFENCE instruction on x86/x86_64 processors?

How do I Understand Read Memory Barriers and Volatile

Dependent loads reordering in CPU

Acquire/release semantics with 4 threads

What is a memory fence?

C++ Memory Barriers for Atomics

When are x86 LFENCE, SFENCE and MFENCE instructions required?

Fastest inline-assembly spinlock

Analyzing of x86 output generated by JIT in the context of volatile

Memory Barrier by lock statement

How do modern Intel x86 CPUs implement the total order over stores

Are mutex lock functions sufficient without volatile?

Memory model ordering and visibility?

C# volatile variable: Memory fences VS. caching

GCC reordering up across load with `memory_order_seq_cst`. Is this allowed?

Hot Network Questions