In this essay, I’ll talk about “performance culture.” Performance is one of the key pillars of software engineering, and is something that’s hard to do right, and sometimes even difficult to recognize. As a famous judge once said, “I know it when I see it.” I’ve spoken at length about performance and culture independently before, however the intersection of the two is where things get interesting. Teams who do this well have performance ingrained into nearly all aspects of how the team operates from the start, and are able to proactively deliver loveable customer experiences that crush the competition. There’s no easy cookie-cutter recipe for achieving a good performance culture, however there are certainly some best practices you can follow to plant the requisite seeds into your team. So, let’s go!
Midori was written in an ahead-of-time compiled, type-safe language based on C#. Aside from our microkernel, the whole system was written in it, including drivers, the domain kernel, and all user code. I’ve hinted at a few things along the way and now it’s time to address them head-on. The entire language is a huge space to cover and will take a series of posts. First up? The Error Model. The way errors are communicated and dealt with is fundamental to any language, especially one used to write a reliable operating system. Like many other things we did in Midori, a “whole system” approach was necessary to getting it right, taking several iterations over several years. I regularly hear from old teammates, however, that this is the thing they miss most about programming in Midori. It’s right up there for me too. So, without further ado, let’s start.
In my first Midori post, I described how safety was the foundation of everything we did. I mentioned that we built an operating system out of safe code, and yet stayed competitive with operating systems like Windows and Linux written in C and C++. In many ways, system architecture played a key role, and I will continue discussing how in future posts. But, at the foundation, an optimizing compiler that often eeked out native code performance from otherwise “managed”, type- and memory-safe code, was one of our most important weapons. In this post, I’ll describe some key insights and techniques that were essential to our success.
Midori was built out of many ultra-lightweight, fine-grained processes, connected through strongly typed message passing interfaces. It was common to see programs that’d’ve classically been single, monolithic processes – perhaps with some internal multithreading – expressed instead as dozens of small processes, resulting in natural, safe, and largely automatic parallelism. Synchronous blocking was flat-out disallowed. This meant that literally everything was asynchronous: all file and network IO, all message passing, and any “synchronization” activities like rendezvousing with other asynchronous work. The resulting system was highly concurrent, responsive to user input, and scaled like the dickens. But as you can imagine, it also came with some fascinating challenges.
Last time, we saw how Midori built on a foundation of type, memory, and concurrency safety. This time, we will see how this enabled some novel approaches to security. Namely, it let our system eliminate ambient authority and access control in favor of capabilities woven into the fabric of the system and its code. As with many of our other principles, the guarantees were delivered “by-construction” via the programming language and its type system.
Midori was built on a foundation of three kinds of safety: type, memory, and concurrency-safety. These safeties eliminated whole classes of bugs “by-construction” and delivered significant improvements in areas like reliability, security, and developer productivity. They also fundamentally allowed us to depend on the type system in new and powerful ways, to deliver new abstractions, perform novel compiler optimizations, and more. As I look back, the biggest contribution of our project was proof that an entire operating system and its ecosystem of services, applications, and libraries could indeed be written in safe code, without loss of performance, and with some quantum leaps forward in several important dimensions.
Enough time has passed that I feel safe blogging about my prior project here at Microsoft, “Midori.” In the months to come, I’ll publish a dozen-or-so articles covering the most interesting aspects of this project, and my key take-aways.
There’s one especially important trait I look for in great team members: intellectual honesty. Many other traits typically follow suit, but lacking this foundation can lead to frequent Kobayashi Maru situations. At best, those slow down the team without adding value, and at worst, turn an entire team toxic and ruin its core cultural values. And it can happen quickly.
It seems that I completely blow away and recreate my blog every two years or so. The time has come. I ended up horking my prior setup and have moved everything over to GitHub.
One technique we explored in my team’s language work is something we call “fail-fast.” The idea is that a program fails as quickly as possible after a bug has been detected. This idea isn’t really all that novel, but the ruthless application of it, perhaps, was.
No matter how smart of a leader you are, you’re going to be wrong sometimes. Often even. And you won’t always have the best ideas.
I work in a team where the microkernel is developed in close partnership with the backend code-generator. Where the language is developed in close partnership with the class libraries. Where it’s just as common for a language designer to comment on the best use of the language style in the implementation of the filesystem, as it is for such an exchange in the context of the web browser, as it is for a device driver developer to help influence the overall async programming model’s design to better suit his or her scenarios.
One of the first questions I ask when joining a new team is: Where do code reviews happen?
My team has been designing and implementing a set of “systems programming” extensions to C# over the past 4 years. At long last, I’ll begin sharing our experiences in a series of blog posts.
What I am about to say admittedly flies in the face of common wisdom. But I grow more convinced of it by the day. Put simply, there ought not to be a distinction between software research and software engineering.