Erlang and High-Scale System Software

I’ve been involved with high scale systems software projects, mostly database engines, for the last 20 years and I’ve watched the transition from low level and proprietary languages to C. Then C to C++. Recently I’ve been thinking a bit about what’s next.

Back in the very early 90’s when I was Lead Architect on IBM DB2, I was dead against C++ usage in the Storage Engine and wouldn’t allow exceptions to be used anywhere in the system. At the time, the quality of C++ compilers was variable with some being real compilers that were actually fairly well done (I lead the IBM RS/6000 C++ team in the late 80s) while others were Cfront-based and pretty weak. At the time no compiler, including the one I worked on, did a good job implementing exceptions. Times change. SQL Server, for example, is 100% C++ and it makes excellent use of exception to clean up resources on failure.

The productivity benefits of new programming languages and tools eventually wins out. When they get broad use, implementations improve reducing the performance tax and, eventually, even very performance sensitive system software make the transition.

I got interested in Java in the mid-90’s and more recently I’ve been using C# quite a bit partly due to where I work and partly because I actually find the language and surrounding tools impressively good. JITed languages typically don’t perform as well as statically compiled languages but the advantages completely swamp the minor performance costs. And, as managed language (Java, C#, etc.) implementations improve, the performance tax continues to fall. There is no question in my mind that managed languages will end up being broadly used in even the most performance critical software systems such as database engines.

Recently, I’ve gotten interested in Erlang as an systems software implementation language. By most measures, it looks to be an unlikely choice for high scale system software in that its interpreted, has a functional subset at its core, and uses message passing rather than shared memory and locks. Basically, it’s just about the opposite of everything you would find in a modern commercial database kernel. So what makes it interesting? The short answer is all the things that make it an unlikely choice also make it interesting. Servers are becoming increasingly unbalanced with CPU speeds continuing to outpace memory and network bandwidth. More and more operations are going to be memory and network bound rather than CPU if they aren’t already. Trading some CPU resources to get a more robust implementation that is easier to understand and maintain is a good choice. In addition, CPU speed increases are now coming more from multiple cores than from frequency scaling a single core. Consequently a language that produces an abundance of parallelism is a an asset rather than a problem. Finally, large systems software projects like database management systems, operating systems, web servers, IM servers, email systems, etc. are incredibly large and complex. The Erlang model of spawning many lightweight threads that communicate via message passing is going to be less efficient than the more common shared memory and locks solution but it’s much easier to get correct. Erlang also encourages a “fail fast” programming model. I’ve long argued that this is the only way to get high scale systems software correct (Designing and Deploying Internet-Scale Services).

Certainly Erlang brings a tax as have other new languages that we have adopted over the years. But, it also bring some of what we need badly right now. For example, the fail fast programming model is the right one and, when combined with synchronous state redundancy, is how most high-scale systems should be written. Erlang also encourages the production of a very large number of threads which can be a good thing on very high core count servers. Message passing rather than shared memory with locks and fail fast with operation restart significantly increases the probability of the software system working correctly through unexpected events.

From my perspective, the syntax of Erlang is less than beautiful but all the advantages above make up for most of that.

The Concurrency and Coordination Runtime is a .Net runtime that implements some of the features I mention above for languages like C#. George Chrysanthakopoulos, Microsoft CCR Architect, reports that MySpace is using it: MySpace.com using the CCR (Sriram Krishnan pointed me to this one).

It appears that Erlang usage is ramping up fairly quickly right now. Naturally, since it was developed there, Erlang is used by many Ericsson projects including the AXD301 ATM Switch and the AXE line of switches. The AXD series includes over 850k lines of Erlang. However, outside of Ericsson some very interesting examples are emerging. Amazon’s SimpleDB is written is Erlang (Amazon SimpleDB is built on Erlang and What You Need To Know About Amazon SimpleDB). The recently released (quietly) Facebook Chat application uses Erlang as well (Dare Obasanjo sent that one my way). CouchDB is written Erlang as well (CouchDB: Thinking beyond the RDBMS). Some more Erlang applications from the Erlang FAQ:

Is it time for a new server-side implementation language?

–jrh

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh | blog:http://perspectives.mvdirona.com

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.