In order to guard a code section against repeat or concurrent execution we can use Interlocked functionality. Guarding against repeat execution is necessary for things like Dispose(), and guarding against concurrent execution is a fundamental part of high-performance multithreading.
Originally I used Interlocked.Increment() because if effectively records the total number of calls and thus has greater diagnostic power than the options based on exchange operations. However, was wondering whether Interlocked.CompareExchange() might not be a better option with regard to minimising CPU bus noise and inter-core interference related to inter-core cache update operations.
When guarding against repeat execution there is probably not much difference because the expected number of calls is exactly 1 in non-pathological cases and hence Interlocked.CompareExchange() will need to do a volatile write just like the Interlocked.Exchange() or Interlocked.Increment(). Also, the total number of executions in use cases like Dispose() will be comparatively small, so who cares.
However, things are different when guarding against concurrent execution, like for a non-blocking critical section. In that case there can be huge numbers of unsuccessful execution attempts and for these there should be no volatile write if Interlocked.CompareExchange() is used, only the volatile read that has to be performed in any case. This should give the CAS operation an edge over the other two options. Also, in this case Interlocked.Increment() has the additional disadvantage that even failed acquisition attempts need a volatile undo operation in the form of Interlocked.Decrement().
From this I conclude that it makes sense to use Interlocked.CompareExchange() for guarding against repeat or concurrent execution if there are no other, overriding concerns at play.
Is this reasoning sound or am I overlooking something?
The objective is to choose the best option for the general case, as a coding convention, and also to be aware when further, case-specific analysis may be in order (like doing a volatile read before the CAS as indicated by Ahmed AEK in a comment).
P.S.: I originally cut my teeth in/on Turbo Pascal and Delphi, which did not have anything comparable to Volatile.Write() and offered interlocked operations only as imports from the Win32 API. This may have led to an undue preference for solutions involving interlocked increment and decrement rather than CompareExchange with release via a volatile write.
Concrete examples as requested by multiple comment respondents
Candidate patterns for preventing repeat execution:
if (Interlocked.Increment(ref m_disposed) == 1)
{
// ... disposal code ...
}
versus (current preference)
if (Interlocked.CompareExchange(ref m_disposed, 1, 0) == 0)
{
// ... disposal code ...
}
Candidates for preventing concurrent execution:
var times_entered = Interlocked.Increment(ref m_active);
try
{
if (times_entered == 1) // no-one else here
{
// ... guarded code section ...
}
}
finally
{
Interlocked.Decrement(ref times_entered);
}
versus (current preference, but may need amendment as indicated by Ahmed AEK)
try
{
if (Interlocked.CompareExchange(ref m_active, 1, 0) == 0)
{
// ... guarded code section ...
}
finally
{
Volatile.Write(ref m_active, 0);
}
}
Interlocked.CompareExchange()vsInterlocked.Exchange()vsInterlocked.Increment(), honestly, I can only recommend you check out Which is faster? by Eric Lippert. Make a test case yourself and use BenchmarkDotNet. to test it. Honestly I suspect the difference will be miniscule compared to the overhead of the actual work of disposal, the .NET runtime, and whatever else you are doing.std::atomic_flag'stest_and_set, at least on x86 where it's a good choice; I haven't checked ARM, and IDK if that's the right tuning choice for ARM. On modern ARMv8.1 using single-instruction atomics, it's probably like x86 where either option needs the cache line in Exclusive state. Possibly ARM CAS could avoid dirtying it on failure, unlike x86. With old LL/SC which compilers probably only use on old CPUs, potentially CAS gives you an early-out on false.