Lock free code is hard. But it can come in handy in a pinch.
C# 1.0 shipped with the ability to stack allocate data with the stackalloc
keyword, much like C++’s
alloca.
There are restrictions, however, around what you can allocate on the stack:
Inline arrays of primitive types or structs that themselves have fields of
primitive types (or structs that etc…). That’s it. C# 2.0 now allows you to
embed similar inline arrays inside other value types, even for those that are
allocated inside of a reference type on the heap, by using the fixed keyword.
In managed code, you can pass ByRef
s “down the stack.” You can’t do much aside
from that, however, other than use things like ldind
and stind
on them. And of
course, you can cast them to native pointers, store them elsewhere, and so on,
but those sorts of (evil) things are unverifiable.
Via DBox and TBray, I stumbled upon Will Continuations Continue?, a great essay about why continuation support in modern VMs is not a good idea after all:
One of the challenges when designing reusable software that employs hidden parallelism – such as a BCL API that parallelizes sorts, for-all style data parallel loops, and so forth – is deciding, without a whole lot of context, whether to run in parallel or not. A leaf level function call running on a heavily loaded ASP.NET server, for example, probably should not suddenly take over all 16 already-busy CPUs to search an array of 1,000 elements. But if there’s 16 idle CPUs and one request to process, doing so could reduce the response latency and make for a happier user. Especially for a search of an array of 1,000,000+ elements, for example. In most cases, before such a function “goes parallel,” it has to ask: Is it worth it?
Databases have utilized parallelism for a long time now to effectively scale-up and scale-out with continuously improving chip and cluster technologies. Consider a few high-level examples:
Raymond’s recent post talks about queueing user-mode APCs in Win32.
I wrote about torn reads previously, in which, because loads from and stores to > 32-bit data types are not actually “atomic” on a 32-bit CPU, obscure magic values are seen in the program from time to time. This isn’t as scary as “out of thin air” values, but can be troublesome nonetheless. I noted that, by using a lock, you can serialize access to the location to ensure safety.
The profiler that ships with Visual Studio is great for “real” CPU profiling. But let’s face it: there are still some situations where a good ole’ stopwatch works just fine (of the System.Diagnostics.Stopwatch variety). For example, when you’re trying to do some quick and dirty measurement on a very specific region of code, and don’t want to deal with the rest of the noise.
When you perform a wait on the CLR, we make sure it happens in an STA-friendly
manner. This entails using msg-waits, such as MsgWaitForMultipleObjectsEx
and/or CoWaitForMultipleHandles
. Doing so ensures we pick up and dispatch
incoming RPC work mid-stack, while the STA isn’t necessarily sitting in a
top-level message loop. In fact, an STA that doesn’t pump temporarily can
easily lead to temporary and permanent hangs (i.e. deadlocks), especially in
common COM scenarios where reentrant calls across apartments are made (e.g.
MTA->STA->MTA->STA). Even where deadlock isn’t possible, failing to pump can
have a ripple effect across your process, as components wait for other
components to complete intensive work.