Vance Morrison’s excellent MSDN article from a few months back talks about why double checked locking is guaranteed to work on the CLR v2.0, and why it is one of the few safe lock-free mechanisms on the runtime. He also sent an email to the develop.com mailing list a while back explaining why this pattern wasn’t guaranteed to work on the ECMA memory model. We did quite a bit of implementation work and testing to tame the crazy memory model of IA-64 on 2.0. (Note that none of this is in the ECMA specification, so if you’re worried about CLI compatibility, beware.)
These modifications not only enable the double checked locking pattern, but
also prevent constructors from publishing the newly allocated object before
their state has been initialized, as I mentioned in my PDC presentation on
concurrency last year. We accomplish this by ensuring writes have ‘release’
semantics on IA-64, via the st.rel
instruction. A single st.rel x
guarantees
that any other loads and stores leading up to its execution (in the physical
instruction stream) must have appeared to have occurred to each logical
processor at least by the time x
’s new value becomes visible to another
logical processor. Loads can be given ‘acquire’ semantics (via the ld.acq
instruction), meaning that any other loads and stores that occur after a
ld.acq x
cannot appear to have occurred prior to the load. The 2.0 memory
model does not use ld.acq
’s unless you are accessing volatile data (marked w/
the volatile
modifier keyword or accessed via the Thread.VolatileRead
API).
This can lead to some subtle problems.
For example, a slight variant of the double checked lock will not work under our model:
class Singleton
{
private static object slock = new object();
private static Singleton instance;
private static bool initialized;
private Singleton() {}
public Instance
{
get
{
if (!initialized)
{
lock (slock)
{
if (!initialized)
{
instance = new Singleton();
initialized = true;
}
}
}
return instance;
}
}
}
You might have decided to use this pattern to determine whether to initialize a
value-type, since checking the variable for null isn’t possible. If you had
some more complex set of state, perhaps you want to use a single Boolean
rather
than checking, say, 10 separate variables to see if they have each been
initialized. Whatever your reasoning, as written the above code is prone to a
subtle race condition.
The problem here is that both reads of initialized
and instance
do not have
‘acquire’ semantics. Thus, instance could appear to have been read before
initialized, e.g. as follows:
Time | Thread A | Thread B |
---|---|---|
0 | Reads instance as null | |
1 | Reads initialized as false | |
2 | Sets instance to ref to new obj | |
3 | Sets initialized to true | |
4 | Uses instance (initialized) | |
5 | Reads initialized as true | |
6 | Uses instance (null!) |
Thread B ends up returning a null reference. If a caller tried to use it, they
might encounter a spurious NullReferenceException
, the cause of which is
incredibly hard to debug. For example:
void f()
{
Singleton s = Singleton.Instance;
s.DoSomething(); // Boom!
}
For this to have happened, Thread B would have had to read instance entirely
out of order. It might have done so for any number of reasons. If it recently
executed some code that pulled it into cache—either directly or due to
locality—it isn’t required to invalidate the cache with non-acquire reads,
even though it observed a new write with release semantics, because it’s as if
the load was moved before the load of initialized. Or superscalar execution
might perform branch prediction and retrieve the value of instance, assuming
that initialized will be false, pulling it into cache ahead of the read of
initialized. Again, because it is a non-acquire read, this is a valid thing to
do. If it reads initialized
as true
, its prediction was actually correct, and
it just returns the null
value that was pre-fetched. It might even be the case
that a compiler along the way moved the read, which is also entirely legal with
our memory model.
One possible solution for this is to employ a volatile-read on the first read of the initialized variable, prohibiting the read of instance from moving prior to the read of initialized. Control dependency prevents us from having to use a volatile-read for the reads of both variables.
class Singleton
{
private static object slock = new object();
private static Singleton instance;
private static int initialized;
private Singleton() {}
public Instance
{
get
{
if (Thread.VolatileRead(ref initialized) == 0)
{
lock (slock)
{
if (initialized == 0)
{
instance = new Singleton();
initialized = 1;
}
}
}
return instance;
}
}
}
You could have instead inserted a call to Thread.MemoryBarrier
instead, which
is a two way memory-fence, in between if-block and the read of instance, but
the cost of a barrier is generally higher than both a st.rel
and ld.acq
because
it affects surrounding instructions and movement in both directions.
The take-away here is not that you must understand the specifics of how cache coherency, speculative execution, and our memory model interact. Rather, it should be that once you venture even slightly outside of the bounds of the few “blessed” lock-free practices mentioned in the article mentioned above, you are opening yourself up to the worst kind of race conditions. Using locks is a simple way to avoid this pain. And hopefully someday in the future, transactional memory will enable performant execution of code with lock elision techniques that lead to the performance of lock-free code, but without any of the mental illness that such techniques have been proven to cause.