Embarrasingly Distributed Cloud Services

Ken Church, Albert Greenberg, and I just finished On Delivering Embarrassingly Distributed Cloud Services which has been accepted for presentation at ACM Hotnets 2008 in Calgary, Alberta October 6th and 7th. This paper followed from the discussion and debate around a blog entry that Ken and I did some time back: Diseconomies of scale where we argue that the industry trend towards mega-datacenters needs to be questioned and, in many cases, is simply not cost effective.

There are times when Mega-datacenters do makes sense. Very large data analysis jobs and large, multi-server workloads with considerable inter-node communications traffice run best against large central data stores. MapReduce jobs are the classic example of this sort of workload. However, we argue that other types of workloads actually run better in distributed micro-datacenters. Highly partitionable applications with light inter-partition traffic can be better hosted in distributed micro-datacenters. Highly interactive applications such as Google Docs need to be close to their users. Network round trip latencies can make highly interactive applications frustrating to use. We collectively refer to applications can be partitioned effectively and run close to the edge (the users) as Embarrassingly Distributed. Essentially, these are the easy applications when it comes to running them close to the edge.

In the paper, we argue that the class of applications that are embarrassingly distributed and therefore run well on distributed micro-datacenters is large and we are go on to show that distributed micro-datacenters can offer considerable advantage over mega-centers. Essentially the point is that you can run many applications over distributed micr-datacenters and, if you can, you should.

Micro-datacenters are made possible by containerization that I wrote about in a 2007 Conference on Innovative Data Research Paper: Architecture for Modular Data Centers. When that paper was published Rackable Systems had just shipped their first containerized design and Sun Microsystems had announced Black Box but it wasn’t yet shipping. Two years later, containerized designs are offered by most of the major datacenter server vendors:

· IBM Scalable modular data center

· Rackable ICE Cube™ Modular Data Center

· Sun Modular Datacenter S20 (project Blackbox)

· Dell Insight

· Verari Forest Container Solution

Microsoft recently announced the first containerized data center in Chicago: First Containerized Data Center Announcement. The Chicago announcement is a mega-center but it does show that containerized designs are now ready for primetime.

Mega-datacenters remain useful and aren’t going away any time soon but, in Delivering Embarrassingly Distributed Cloud Services, we argue that distributed micro-datacenters are appropriate for many workloads and can reduce costs, improve the quality of service, and increase the speed of deployment.


James Hamilton, Data Center Futures
Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |

H:mvdirona.com | W:research.microsoft.com/~jamesrh | blog:http://perspectives.mvdirona.com

2 comments on “Embarrasingly Distributed Cloud Services
  1. Thanks for the comments Jason. In this model, since we’re only putting in 12kw/condo, we only have 1 or 2 racks of space consumed in each condo. The rest of each unit is empty so we propose walling the servers and renting the rest.


  2. Jason Dai says:

    Hi James,

    I think your paper is incredibly insightful and I really enjoyed it. I do have one question that needs more clarifications though. In table 2 of your paper, you estimated the annual income for the condo farm by renting the condos for $8.1M/year. I’m wondering what exactly you mean by "renting the condos". In my idea, we are going to use those condos to host the servers, instead of renting them to others.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.