The pace of innovation in data center design has been rapidly accelerating over the last 5 years driven by the mega-service operators. In fact, I believe we have seen more infrastructure innovation in the last 5 years than we did in the previous 15. Most very large service operators have teams of experts focused on server design, data center power distribution and redundancy, mechanical designs, real estate acquisition, and network hardware and protocols. But, much of this advanced work is unpublished and practiced at a scale that is hard to duplicate in a research setting.
At low scale with only a data or center or two, it would be crazy to have all these full time engineers and specialist focused on infrastructural improvements and expansion. But, at high scale with 10s of data centers, it would be crazy not to invest deeply in advancing the state of the art.
Looking specifically at cloud services, the difference between an unsuccessful cloud service and a profitable, self-sustaining business is the cost of the infrastructure. With continued innovation driving down infrastructure costs, there is investment capital available, services can be added and improved, and value can be passed on to customers through price reductions. Amazon Web Services, for example, has had 11 price reductions in 4 years. I don’t recall that happening in my first 20 years working on enterprise software. It really is an exciting time in our industry.
Facebook is a big business operating at high scale and they also have elected to invest in advanced infrastructure designs. Jonathan Heiliger and the Facebook infrastructure team have hired an excellent group of engineers over the past couple of years and are now bringing these designs to life in their new Prineville Oregon facility. I had the opportunity to visit this datacenter 6 weeks back just before it started taking production load. I had an excellent visit, got to catch up with some old friends, meet some new ones, and tour an impressive facility. I saw an unusually large number of elegant designs ranging from one of the cleanest mechanical systems I’ve come across, three phase 480VAC directly to the rack, a low voltage direct current distributed uninterruptable power supply system, all the way through to custom server designs. But, what made this trip really unusual is that I’m actually able to talk about what I saw.
In fact, more than allowing me to talk about it, Facebook has decided to release most of the technical details surrounding these designs publically. In the past, I’ve seen some super interesting but top secret facilities and I’ve seen some public but not particularly advanced data centers. To my knowledge, this is the first time an industry leading design has been documented in detail and released publically.
The set of specifications Facebook is releasing are worth reading so I’m posting links to all below. I encourage you to go through these in as much detail as you chose. In addition, I’ll also post summary notes over the next couple of days explain aspects of the design I found most interesting or commenting upon the pros and cons of some of the approaches employed.
The specifications:
· Battery Cabinet (Distributed UPS)
· Server Chassis and Triplet Hardware
My commendations to the specification authors Harry Li , Pierluigi Sarti, Steve Furuta, Jay Park and to the rest of the Facebook infrastructure team for releasing this work publically and for doing so in sufficient detail that others can build upon it. Well done.
–jrh
Update:
· Open Compute Web Site: http://opencompute.org/
· Live Blog of the Announcement: http://www.insidefacebook.com/2011/04/07/live-blogging-facebooks-open-compute-project/
James Hamilton
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
Generally, higher voltage close to the load is more efficient for a given sized conductor. So, 277VAC is a better choice than 48VDC in the common case where not running on UPS. High voltage DC, usually around 400VDC, avoids this problem and has been successfully tested at LLNL (joint project with Intel).
The combination of the regulatory discomfort due to lack of familiarity with high voltage DC and the gains above a good high voltage AC design being small make jumping to DC wholesale somewhat unappealing at this point.
–jrh
The dual-input (277 VAC/48VDC) server PSUs intrigue me. Wonder how tradeoffs compare between 48V-for-90-seconds-backup-only vs. using 48VDC for main power to the servers, converted at the battery rack level. Presumably two conversions would be less efficient than 277 VAC to 12.5 VDC directly at the server?
The challenge faced by the franchising model is its super hard to control the quality but it can be done. Interesting suggestion.
–jrh
It’s great news, but what’s missing is all the operational knowledge that really goes towards making an operational datacenter efficient and reliable.
This is one of the reasons that I think AWS should move towards a franchising model. Just like McDonalds hands every one of it’s operators a binder, trains them up and lets them loose in their street, I think if Amazon wanted to expand at a much higher rate and lower risk than today, taking all their operational know-how, putting it in (big) ‘binders’ and then selling franchises to people all over the world could relatively quickly see AWS "availability zones" in practically every relevant regulatory environment or population center compared to doing "some at a time" on their own dime.
My 2c :-)
This particular announcement didn’t include networking gear. However, there is a lot going on across the industry. My general take on networking is here://perspectives.mvdirona.com/2010/10/31/DatacenterNetworksAreInMyWay.aspx
–jrh
Any innovation in the networking stack? The server + openstack part is cool, but we seem to be standing pat on networks, switches, routers, etc.