Ken Church, Albert Greenberg, and I just finished On Delivering Embarrassingly Distributed Cloud Services which has been accepted for presentation at ACM Hotnets 2008 in Calgary, Alberta October 6th and 7th. This paper followed from the discussion and debate around a blog entry that Ken and I did some time back: Diseconomies of scale where we argue that the industry trend towards mega-datacenters needs to be questioned and, in many cases, is simply not cost effective.
There are times when Mega-datacenters do makes sense. Very large data analysis jobs and large, multi-server workloads with considerable inter-node communications traffice run best against large central data stores. MapReduce jobs are the classic example of this sort of workload. However, we argue that other types of workloads actually run better in distributed micro-datacenters. Highly partitionable applications with light inter-partition traffic can be better hosted in distributed micro-datacenters. Highly interactive applications such as Google Docs need to be close to their users. Network round trip latencies can make highly interactive applications frustrating to use. We collectively refer to applications can be partitioned effectively and run close to the edge (the users) as Embarrassingly Distributed. Essentially, these are the easy applications when it comes to running them close to the edge.
In the paper, we argue that the class of applications that are embarrassingly distributed and therefore run well on distributed micro-datacenters is large and we are go on to show that distributed micro-datacenters can offer considerable advantage over mega-centers. Essentially the point is that you can run many applications over distributed micr-datacenters and, if you can, you should.
Micro-datacenters are made possible by containerization that I wrote about in a 2007 Conference on Innovative Data Research Paper: Architecture for Modular Data Centers. When that paper was published Rackable Systems had just shipped their first containerized design and Sun Microsystems had announced Black Box but it wasn’t yet shipping. Two years later, containerized designs are offered by most of the major datacenter server vendors:
Microsoft recently announced the first containerized data center in Chicago: First Containerized Data Center Announcement. The Chicago announcement is a mega-center but it does show that containerized designs are now ready for primetime.
Mega-datacenters remain useful and aren’t going away any time soon but, in Delivering Embarrassingly Distributed Cloud Services, we argue that distributed micro-datacenters are appropriate for many workloads and can reduce costs, improve the quality of service, and increase the speed of deployment.