Just about exactly one year ago, I posted a summary and the slides from an excellent Butler Lampson talk: The Uses of Computers: What’s Past is Merely Prologue. Its time for another installment. Butler was at SOSP 2009 a few weeks back and Marvin Theimer caught up with him for a wide ranging discussion on distributed systems.

With Butler’s permission, what follows are Marvin’s notes from the discussion.

Contrast cars with airplanes: when the fancy electronic systems fail you (most-of-the-time) can pull a car over to the side of the road and safely get out whereas an airplane will crash-and-burn.

Systems that behave like cars vs. airplanes:

· It’s like a car if you can reboot it

· What’s the scope of the damage if you reboot it

Ensuring that critical sub-systems keep functioning:

· Layered approach with lower layers being simpler and able to cut off the higher layers and still keep functioning

· Bottom layers need to be simple enough to reason about or perhaps even formally verify

· Be skeptical about designing systems that gracefully degrade/approach their “melting points”. Nice in theory, but not likely to be feasible in practice in most cases.

· Have “firewalls” that partition your system into independent modules so that damage is contained.

· Firewalls have “blast doors” that automatically come down in case of alarms going off. Under normal circumstances the blast doors are up and you have efficient, optimized interaction between modules. When alarms go off the blast doors go down. The system must be able to work in degraded mode with the blast doors down.

· You need to continually test your system for how it behaves with the blast doors down to ensure that “critical functioning” is still achievable despite system evolution and environment evolution. Problem is that testing is expensive, so there is a trade-off between extensive testing and cost. Essentially you can’t test everything. This is part of the reason why the lowest levels of the system need to be simple enough to formally reason about their behavior.

o Dave Gifford’s story about bank that had diesel backup generators for when power utility failed. They religiously tested firing up the backup generators. However, when a prolonged power actually occurred they discovered that the generators failed after half an hour because their lubricating oil failed. No one had thought to test running on backup power for more than a few minutes.

Low-level multicast is bad because you can’t reason about the consequences of its use. Better to have application-level multicast where you can explicitly control what’s going on.

RPC conundrum:

· People have moved back from RPC to async messages because of the performance issues of sync RPC.

· By doing so they are reintroducing concurrency issues into their programs.

A possible solution:

· Constrain your system (if you can) to enable the definition of a small number of interaction patterns that hide the concurrency and asynchrony.

· Your experts employ async messages to implement those higher-level interaction patterns.

· The app developers only use the simper, higher-level abstractions.

· Be happy with 80% solution – which you might achieve – and don’t expect to be able to handle all interactions this way.

Partitioned, primary-key scale-out approach is essentially mimicking the OLTP model of transactional DBs. You are giving up certain kinds of join operators in exchange for scale-out and the app developer is essentially still programming the simple ACID DB model.

· Need appropriate patterns/framework for updating multiple objects in a non-transactional manner.

· Standard approach: update one object and transactionally write a message in a message queue for the other object. Transactional update to other object is done asynchronously to the first object update. Need compensation code for when things go wrong.

· An interesting approach for simplifying the compensation problem: try to turn it into a garbage collection problem. Background tasks look for to-do messages that haven’t been executed and figure out how to bring the system back into “compliance”. You need this code for your fsck case anyway.

WARNING: don’t over-engineer your system. Lots of interesting ideas here; you’ll be tempted to over-generalize and make things too complicated. “Ordinary designers get it wrong 99% of the time; really smart designers get it wrong 50% of the time.”

Thanks to Butler Lampson and Marvin Theimer for making this summary available.


James Hamilton

e: jrh@mvdriona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com