Joe Duffy's Blog

  • September 17, 2008
  • Building a custom thread pool (series, part 3): incorporating work stealing queues
  • In part 2 of this series, I described a new work stealing queue data structure used for work item management. This structure allows us to push and pop elements into a thread-local work queue without heavy-handed synchronization. Moreover, this distributed a large amount of the scheduling responsibility across the threads (and hence processors). The result is that, for recursively queued work items, scalability is improved and pressure on the typical bottleneck in a thread pool (i.e., the global lock) is alleviated.

  • September 13, 2008
  • On being stateful
  • Most programs are tangled webs of data and control dependencies. For sequential programs, this doesn’t matter much aside from putting constraints on the legal optimizations available to a compiler. But it gets worse. Imperative programs today are also full of side-effect dependencies. Unlike data and control dependence—which most compilers can identify and understand the semantics of (aliasing aside)—side-effect dependencies are hidden and the semantic meaning of them is entirely ad-hoc. These can include scribbling to shared memory, writing to the disk, or printing to the console.

  • August 12, 2008
  • Parallel Extensions is on Mono
  • Miguel de Icaza recently blogged about the addition of Parallel Extensions to the Mono family.

  • August 11, 2008
  • Building a custom thread pool (series, part 2): a work stealing queue
  • The primary reason a traditional thread pool doesn’t scale is that there’s a single work queue protected by a global lock. For obvious reasons, this can easily become a bottleneck. Two primary things contribute heavily to whether the global lock becomes a limiting factor for a particular workload’s throughput:

  • July 29, 2008
  • Building a custom thread pool (series, part 1)
  • This is the first part in a series I am going to do on building a custom thread pool. Not that I’m advocating you do such a thing, but I figured it could be interesting to explore the intricacies involved. We’ll start off really simple:

  • July 20, 2008
  • A bit more formalism as to why CLR's MM is broken on x86/x64
  • Here’s a slightly more formal approach to demonstrating that the CLR MM is improperly implemented for the particular example I showed earlier.

  • July 16, 2008
  • "Loads cannot pass other loads" is a ~myth
  • The adjacent release/acquire problem is well known. As an example, given the program:

  • June 23, 2008
  • Final manuscript for Concurrent Programming on Windows has been submitted
  • I just submitted the final manuscript for Concurrent Programming on Windows to Addison-Wesley.

  • June 13, 2008
  • Volatile reads and writes, and timeliness
  • We had an interesting debate at a Parallel Extensions design meeting yesterday, where I tried to convince everybody that a full fence on SpinLock exit is not a requirement. We currently offer an Exit(bool) overload that accepts a flushReleaseWrite argument. This merely changes the lock release from

  • June 5, 2008
  • A tour of the new Parallel Extensions CTP on Channel9
  • We sat down last week with Charles from Channel9 to discuss the new CTP. Both parts got posted today: