Occasionally I come across a noteworthy datacenter design that is worth covering. Late last year a very interesting Japanese facility was brought to my attention by Mikio Uzawa an IT consultant who authors the Agile Cat blog. I know Mikio because he occasionally translates Perspectives articles for publication in Japan.

Mikio pointed me to the Ishikari Datacenter in Ishikari City, Hokkaido Japan. Phase I of this facility was just completed in November 2011. This facility is interesting for a variety of reasons but the design features I found most interesting are: 1) High voltage direct current power distribution, 2) whole building ductless cooling, and 3) aggressive free air cooling.
High Voltage Direct Current Power Distribution
I first came across the use of direct current when Annabel Pratt took me through the joint work Intel was doing with Lawrence Berkeley National Lab on datacenter HVDC distribution (Evaluation of Direct Current Distribution in Data Centers to Improve Energy Efficiency). In this approach they distribute 400V direct current rather than the more conventional 208V to 240V alternating current used in most facilities today.
High voltage direct current work in datacenters has been around for around a decade and it is in extensive test at many facilities world-wide. Many companies are 100% focused on HVDC design consulting with Validus being one of the better known.
The savings potential of HVDC are often shown to be very exciting with numbers beyond 30% frequently quoted. But the marketing material I’ve gone through in detail compare excellent HVDC designs with very poor AC designs. Predictably the savings are around 30%. Unfortunately, the difference between good AC and bad AC designs are also around 30% :-).
When I look closely at HVDC distribution, I see slight improvements in efficiency at around 3 to 5%, somewhat higher costs of equipment since it is less broadly used, less equipment availability and longer delivery times, and somewhat more complex jurisdictional issues with permitting and other approvals taking longer in some regions. Nonetheless, the picture continues to improve, the industry as a whole continues to learn, and I think there is a good chance that high voltage DC distribution will end up becoming a more common choice in modern datacenters.
The Ishikari facility is a high voltage DC distribution design. I’m looking forward to learning more about this aspect of the facility and watching how the system performs.

Whole Building Ductless Cooling
Air handling ducts costs money and restrict flow so why not recognize that the entire purpose of a datacenter shell is to keep the equipment dry and secure and to transport heat. Instead of installing extensive duct work, just treat the entire building as a very large air duct.
Perhaps the nicest mechanical design I’ve come across based upon ductless cooling is the Facebook Prineville facility. In this design, they use the entire second floor of the building for air handling and the lower floor for the server rooms.

The Ishikari design shares many design aspects with the Intel Jones Farms facility where the IT equipment is on the second floor and the electrical equipment is on the first.

Aggressive Free-Air Cooling
Looking at the air flow diagram above, you can see that the Ishikari Datacenter is making good use of the datacenter friendly climate of Japan and aggressively using free-air cooling. Free-air cooling, often called air side economization, is one of the most effective ways of driving down datacenter costs and substantially increasing overall efficiency. It’s good to see this design point spreading rapidly.
More information is available at: http://ishikari.sakura.ad.jp/index_eng.html
Some datacenter designs I’ve covered in the past:
· Facebook Prineville Mechanical Design
· Facebook Prineville UPS & Power Supply
· Example of Efficient Mechanical Design
· 46MW with Water Cooling at a PUE of 1.10
· Yahoo! Compute Coop Design
· Microsoft Gen 4 Modular Data Centers
James Hamilton
e: jrh@mvdirona.com
w: http://www.mvdirona.com
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
As rescue and relief operations continue in response to the serious flooding in Thailand the focus has correctly been on human health and safety. Early reports estimated 317 fatalities, 700,000 homes and 14,000 factories impacted with over 660,000 not able to work. Good coverage mostly from the Bangkok Post is available at Newley.com authored by a reporter in the regoin. For example: http://newley.com/2011/11/02/thailand-flooding-update-november-2-2011-front-page-of-todays-bangkok-post/.
The floods are far from over and, as we look beyond the immediate problem in country, the impact on the technology world is expected to continue for just over a year even if the floods do recede in 3 to 4 weeks as expected. Disk drives are particularly hard hit with Digitimes Research reporting that the flood will create a 12% HDD supply gap in the 4th quarter of 2011 and the gap may increase into 2012. Digitimes estimates the 4Q11 hard disk drive shortage to reach 19 million units.

Western Digital was hit the hardest by the floods with Tim Leyden, WD COO describing the situation in the last investor quarterly report as:
The flooded buildings in Thailand include our HDD assembly, test and slider facilities where a substantial majority of our slider fabrication capacity resides. In parallel with the internal slider shortages resulting from the above disruption, we are also experiencing other shortages on component parts from vendors located in several Thai industrial parks that have already been inundated by the floods, or have been affected by protective plant shutdowns. We are evaluating the situation on a continuous basis, but in order to get these facilities back up and running, we need the water level to stabilize, after which point it will take some period of time for the floods to recede. We are assessing our options so that we can safely begin working to accelerate the water removal and either extract and transfer the equipment to clean-rooms in other locations or prepare it for operation on-site. As a result of these activities, at this point in time, we estimate that our regular capacity and possibly our suppliers capacity will be significantly constrained for several quarters.
Toshiba reports Impact of the Floods in Thailand they were seriously impacted as well:
Location: Navanakorn Industrial Estate Zone, Pathumtani, Thailand Main Product: Hard Disk Drive
· Damage status: The water is 2 meters high on the site and the surrounding area and more than 1 meter deep in the buildings. Facilities are damaged but no employees have been injured in the factory.
· Alternative sites: We have started alternative production at other factories, but the production volume will be limited by available capacity.
· Operation: All the employees have been evacuated from the industrial zone, at the order of the Thai government. With the water at its current level, we anticipate a long-term shutdown. The date of resumption of operation is unpredictable.
Because the hard disk supply chain is heavily represented in this region, many hard disk manufacturers with unaffected plants will still lose capacity. Noble Financial Equity Research made the following 4th quarter shipped volume estimates:

Continuing with data from Noble Financial Equity Research:
· Due to the effects of flooding, we do not expect the industry to return to normalcy for 3 to 4 quarters
· We see only 120M drives shipped this quarter versus the TAM (total addressable market) of 175M to 180M units
· Due to lack of channel and finished goods inventory, the supply shortfall in the March quarter is also expected to be sever despite higher expected drive shipments and component availability
· By shifting production out of Asian plants, critical component supplier Nidec believes it can ramp to an output of 170 drive motors by the March quarter
· We see significantly higher drive and component prices persisting into the summer months of 2012
· Seagate will be the principal beneficiary of the supply shortage and higher pricing
· We believe Hutchinson (drive suspension manufacturer) will be able to rapidly ramp its US assembly operations and higher suspension prices will offset the reduced business from Western Digital
James Hamilton
e: jrh@mvdirona.com
w: http://www.mvdirona.com
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
From the Last Bastion of Mainframe Computing Perspectives post:
The networking equipment world looks just like mainframe computing ecosystem did 40 years ago. A small number of players produce vertically integrated solutions where the ASICs (the central processing unit responsible for high speed data packet switching), the hardware design, the hardware manufacture, and the entire software stack are stack are single sourced and vertically integrated. Just as you couldn’t run IBM MVS on a Burrows computer, you can’t run Cisco IOS on Juniper equipment. When networking gear is purchased, it’s packaged as a single sourced, vertically integrated stack. In contrast, in the commodity server world, starting at the most basic component, CPUs are multi-sourced. We can get CPUs from AMD and Intel. Compatible servers built from either Intel or AMD CPUs are available from HP, Dell, IBM, SGI, ZT Systems, Silicon Mechanics, and many others. Any of these servers can support both proprietary and open source operating systems. The commodity server world is open and multi-sourced at every layer in the stack.
Last week the Open Network Summit was hosted at Stanford University. This conference focused on Software Defined Networks in general and Openflow specifically. Software defined networking separates out the router control plane responsible for what is in the routing table from the data plane that makes network packet routing decisions on the basis of what is actually in the routing table. Historically, both operations have been implemented monolithically in each router. SDN, separates these functions allowing networking equipment to compete in how efficiently they route packets on the basis of instructions from a separate SDN control plane.

In the words of OpenFlow founder Nick Mckeown, Software Defined Networks (SDN), will: 1) empower network owners/operators, 2) increase the pace of network innovation, 3) diversify the supply chain, and 4) build a robust foundation for future networking innovation.
This conference was a bit of a coming of age for software defined networking for a couple of reasons. First, an excellent measure of relevance is who showed up to speak at the conference. From academia, attendees included Scott Shenker (Berkeley), Nick McKeown (Stanford), and Jennifer Rexford (Princeton). From industry most major networking companies were represented by senior attendees including Dave Ward (Juniper), Dave Meyer (Cisco), Ken Duda (Arista), Mallik Tatipamula (Ericsson), Geng Lin (Dell), Samrat Ganguly (NEC), and Charles Clark (HP). And some of the speakers from major networking user companies included: Stephen Stuart (Google), Albert Greenberg (Microsoft), Stuart Elby (Verizon), Rainer Weidmann (Deutsche Telekom), and Igor Gashinsky (Yahoo!). The full speaker list is up at: http://opennetsummit.org/speakers.html.
The second data point in support of SDN really coming of age was Dave Meyer, Cisco Distinguished Engineer, saying during his talk that Cisco was “doing Openflow”. I’ve always joked that Cisco would rather go bankrupt than support Openflow so this one definitely caught my interest. Since I wasn’t in attendance myself during Dave’s talk I checked in with him personally. He corrected that it wasn’t a product announcement. They have Openflow running on Cisco gear but “no product plans have been announced at this time”. Still exciting progress and hat’s off for Cisco for taking the first step. Good to see.
If you want a good summary of what is Software Defined Networking, perhaps the best description were the slides that Nick presented at the conference: http://mvdirona.com/jrh/TalksAndPapers/NickMckeown_ON%20Summit%20NickM%2010%202011.pdf.
If you are interested in what Cisco’s Dave Meyer presented at the summit, I’ve posted his slides here: http://mvdirona.com/jrh/TalksAndPapers/DavidMeyer_openflow_and_sdn_for_enterprises.pdf.
Other related postings I’ve made:
· Datacenter Networks are in my Way
· Stanford Clean Slate CTO Summit
· Changes in Networking Systems
· Software Load Balancing Using Software Defined Networking
Congratulations to the Stanford team for hosting a great conference and in helping to drive software defined networking from a great academic idea to what is rapidly becoming a supported option industry-wide.
--jrh
James Hamilton
e: jrh@mvdirona.com
w: http://www.mvdirona.com
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
Great things are happening in the networking market. We’re transitioning from vertically integrated mainframe-like economics to a model similar to what we have in the server world. In the server ecosystems, we have Intel, AMD and others competing to provide the CPU. At the layer above, we have ZT Systems, HP, Dell DCS, SGI, IBM, and many others building servers based upon whichever CPU the customer choses to use. Above that layer, we have wide variety of open sources and proprietary software stacks that run on servers from any of the providers which include any of the CPU providers silicon. There is competition at all layers in the stack.

The networking world is finally heading down the right path with competing merchant silicone providers for the core data plane processing engines used in networking gear (Application Specific Integrated Circuits or ASICs). These ASICs are supplied by Broadcom, Fulcrum, Marvel, and others. A given ASIC will be built into switch gear by many competitors. Above that layer, there are open source networking protocol stacks (Quagga and Xorp) as well as ASIC independent commercial protocol stacks (IP Infusion and Ariscent) and also proprietary stacks.
This is all good news. In fact, things are moving so fast we’re already starting to see some consolidation in the industry with Broadcom acquiring Dune Networks in November 2009 and just last week Intel acquired Fulcrum Microsystems. The Intel acquisition of Fulcrum is a potentially a good thing for the industry in that Fulcrum may now have even more investment capital. However, Fulcrum were hardly starving before having been through five rounds of venture funding netting a booming $102M. Broadcom remains the biggest player in the networking merchant silicon market with a market capitalization of $19.1B.
Hopefully this major acquisition will even further ramp up the pace of innovation and competition in the networking world. I suspect it will even though it feels early for this market to be consolidating.
Related press articles:
· Intel press release
· The Register
· CNET
Related blogs:
· Andy Bechtolsheim: The March to Merchant Silicon in 10Gbe Cloud Networking
· Datacenter Networks are in My Way
--jrh
James Hamilton
e: jrh@mvdirona.com
w: http://www.mvdirona.com
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
This note looks at the Open Compute Project distributed Uninterruptable Power Supply (UPS) and server Power Supply Unit (PSU). This is the last in a series of notes looking at the Open Compute Project. Previous articles include:
· Open Compute Project
· Open Compute Server Design
· Open Compute Mechanical Design
The open compute uses a semi-distributed uninterruptable power supply (UPS) system. Most data centers use central UPS systems where large the UPS is part of the central power distribution system. In this design, the UPS is in the 480 3 phase part of the central power distribution system prior to the step down to 208VAC. Typical capacities range from 750kVA to 1,000kVA. An alternative approach is a distributed UPS like that used by a previous generation Google server design.

In a distributed UPS, each server has its own 12VDC battery to serve as backup power. This design has the advantage of being very reliable with the batter directly connected to the server 12V rail. Another important advantage is the small fault containment zone (small “blast radius”) where a UPS failure will only impact a single server. With a central UPS, a failure could drop the load on 100 racks of servers or more. But, there are some downsides of distributed UPS. The first is that batteries are stored with the servers. Batteries take up considerable space, can emit corrosive gasses, don’t operate well at high temperature, and require chargers and battery monitoring circuits. As much as I like aspects of the distributed UPS design, it’s hard to cost effectively and, consequently, is very uncommon.
The Open Compute UPS design is a semi-distributed approach where each UPS is on the floor with servers but rather than having 1 UPS per server (distributed UPS) or 1 UPS per order 100 racks (central UPS with roughly 4,000 servers), they have 1 UPS per 6 racks (180 servers).

In this design the battery rack is the central rack flanked by two triple racks of servers. Like the server racks, the UPS is delivered 480VAC 3 phase directly. At the top of the battery rack, they have control circuitry, circuit breakers, and rectifiers to charge the battery banks.
What’s somewhat unusual in the output stage of the UPS doesn’t include inverters to convert the direct current back to the alternating current required by a standard server PSU. Instead the UPS output is 48V direct current which is delivered directly to the three racks on either side of the UPS. This has the upside of avoiding the final invert stage which increases efficiency. There is a cost to avoiding converting back to AC. The most important downside is they need to effectively have two server power supplies where one accepts 277VAC and the other accepts 48VDC from the UPS. The second disadvantage is using 48V distribution is inefficient over longer distances due to conductor losses at high amperage.
The problem with power distribution efficiency is partially mitigated by keeping the UPS close to servers where the 6 racks its feeds are on either side of the UPS so the distances are actually quite short. And, since the UPS is only used during short periods of time between the start of a power failure and the generators taking over, the efficiency of distribution is actually not that important a factor. The second issue remains, each server power supply is effectively two independent PSUs.

The server PSU looks fairly conventional in that it’s a single box. But, included in the single box, is two independent PSUs and some control circuitry. This has the downside of forcing the use of a custom, non-commodity power supply. Lower volume components tend to cost more. However, the power supply is a small part of the cost of a server so this additional cost won’t have a substantially negative impact. And, it’s a nice, reliable design with a small fault containment zone which I really like.
The Open Compute UPS and power distribution system avoids one level of power conversion common in most data centers, delivers somewhat higher voltages (277VAC rather than 208VAC) close to the load, and has the advantage of a small fault zone.
James Hamilton
e: jrh@mvdirona.com
w: http://www.mvdirona.com
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
Google cordially invites you to participate in a European Summit on sustainable Data Centres. This event will focus on energy-efficiency best practices that can be applied to multi-MW custom-designed facilities, office closets, and everything in between. Google and other industry leaders will present case studies that highlight easy, cost-effective practices to enhance the energy performance of Data Centres.
The summit will also include a dedicated session on cooling. Presenters will detail climate-specific implementations of free cooling as well as novel ways to utilise locally -available opportunities. We will also debate climate-independent PUE targets.
The agenda includes presentations and panel discussions featuring Amazon, DeepGreen, eBay, Google, IBM, Microsoft, Norman Disney & Young, PlusServer, Telecity Group, The Green Grid, UK's Chartered Institute for IT, UBS and others.
Attendance is free. However, space is limited and we therefore encourage you to register online at your earliest convenience. Your participation will be confirmed.
We look forward to seeing you and your colleagues in Zurich!
Contact:datacentersummit2011@google.com
Last Thursday Facebook announced the Open Compute Project where they released pictures and specifications for their Prineville Oregon datacenter and the servers and infrastructure that will populate that facility. In my last blog, Open Compute Mechanical System Design I walked through the mechanical system in some detail. In this posting, we’ll have a closer look at the Facebook Freedom Server design.
Chassis Design:
The first thing you’ll notice when looking at the Facebook chassis design is there are only 30 servers per rack. They are challenging one of the strongest held beliefs in the industry that is density is the primary design goal and more density is good. I 100% agree with Facebook and have long argued that density is a false god. See my rant Why Blade Servers aren’t the Answer to all Questions for more on this one. Density isn’t a bad thing but paying more to get denser designs that cost more to cool is usually a mistake. This is what I’ve referred to in the past as the Blade Server Tax.
When you look closer at the Facebook design, you’ll note that the servers are more than 1 Rack Unit (RU) high but less than 2 RU. They choose a non-standard 1.5RU server pitch. The argument is that 1RU server fans are incredibly inefficient. Going with 60mm fans (fit in 1.5RU) dramatically increases their efficiency but moving further up to 2RU isn’t notably better. So, on that observation, they went with 60mm fans and a 1.5RU server pitch.

I completely agree that optimizing for density is a mistake and that 1RU fans should be avoided at all costs so, generally, I like this design point. One improvement worth considering is to move the fans out of the server chassis entirely and go with very large fans on the back of the rack. This allows a small gain in fan efficiency by going with larger still fans and allows a denser server configuration without loss of efficiency or additional cost. Density without cost is a fine thing and, in this case, I suspect 40 to 80 servers per rack could be delivered without loss of efficiency or additional cost so would be worth considering.

The next thing you’ll notice when studying the chassis above is that there is no server case. All the components are exposed for easy service and excellent air flow. And, upon more careful inspection, you’ll note that all components are snap in and can be serviced without tools. Highlights:
· 1.5 RU pitch
· 1.2 MM stamped pre-plated steel
· Neat, integrated cable management
· 4 rear mounted 60mm fans
· Tool-less design with snap plungers holding all components
· 100% front cable access
Motherboards:
The Open Compute project supports two motherboard designs where 1 uses an Intel processors and the other uses AMD.
Intel Motherboard:

AMD Motherboard:

Note that these boards are both 12V only designs.
Power Supply:
The power supply (PSU) is an usual design in two dimensions: 1) it is a single output voltage 12v design and 2) it’s actually two independent power supplies in a single box. Single voltage supplies are getting more common but commodity server power supplies still usually deliver 12V, 5V, and 3.3V. Even though processors and memory require somewhere between 1 and 2 volts depending upon the technology, both typically are fed by the 12V power rail through a Voltage Regulator Down (VRD) or Voltage Regulator Module (VRM). The Open Compute approach is to use deliver 12V only to the board and to produce all other required voltages via an Voltage Regulator Module on the mother board. This simplifies the power supply design somewhat and they avoid cabling by having the motherboard connecting directly to the server PSU.

The Open Compute Power Supply is has two power sources. The primary source is 277V alternating current (AC) and the backup power source is 48V direct current (DC). The output voltage from both supplies is the same 12V DC power rail that is delivered to the motherboard.
Essentially this supply is two independent PSUs with a single output rail. The choice of 277VAC is unusual with most high-scale data centers run on 208VAC. But 277 allows one power conversion stage to be avoided and is therefore more power efficient.
Most data centers have mid-voltage transformers(typically in the 13.2kv range but it can vary widely by location). This voltage is stepped down to 480V three phase power in North America and 400V 3 phase in much of the rest of the world. The 480VAC 3p power is then stepped down to 208VAC for delivery to the servers.
The trick that Facebook is employing in their datacenter power distribution system is to avoid one power conversion by not doing the 480VAC to 208VAC conversion. Instead, they exploit the fact that each phase of 480 3p power is 277VAC between the phase and neutral. This avoids a power transformation step which improves overall efficiency. The negatives of this approach are 1) commodity power supplies can’t be used (277VAC is beyond the range of commodity PSUs) and 2) the load on each of the three phases need to be balanced. Generally, this is a good design tradeoff where the increase in efficiency justifies the additional cost and complexity.
An alternative but very similar approach that I like even better is to step down mid-voltage to 400VAC 3p and then play the same phase to neutral trick described above. This technique still has the advantage of avoiding 1 layer of power transformation. What is different is the resultant phase to neutral voltage delivered to the servers is 230VAC which allows commodity power supplies to be used. The disadvantage of this design is that the mid-voltage to 400VAC 3p transformer is not in common use in North America. However this is a common transformer in other parts of the world so they are still fairly easily attainable.
Clearly, any design that avoids a power transformation stage is a substantial improvement over most current distribution systems. The ability to use commodity server power supplies unchanged makes the 400 3p to neutral trick look slightly better than the 480VAC 3p approach but all designs need to be considered in the larger context in which they operate. Since the Facebook power redundancy system requires the server PSU to accept both a primary alternating current input and a backup 48VDC input, special purpose build supplies need to be used. Since a custom PSU is needed for other reasons, going with 277VAC as the primary voltage makes perfect sense.
Overall a very efficient and elegant design that I’ve enjoyed studying. Thanks to Amir Michael of the Facebook hardware design team for the detail and pictures.
--jrh
James Hamilton
e: jrh@mvdirona.com
w: http://www.mvdirona.com
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
Last week Facebook announced the Open Compute Project (Perspectives, Facebook). I linked to the detailed specs in my general notes on Perspectives and said I would follow up with more detail on key components and design decisions I thought were particularly noteworthy. In this post we’ll go through the mechanical design in detail.
As long time readers of this blog will know, PUE has many issues (PUE is still broken and I still use it) and is mercilessly gamed in marketing literature (PUE and tPUE). The Facebook published literature predicts that this center will deliver a PUE of 1.07. Ignoring the power requirements of the mechanical systems, just delivering power to the servers at a 7% loss is a considerable challenge. I’m slightly skeptical of this number as a fully loaded, annual PUE number but I can say with confidence that it is one of the nicest mechanical designs I have come across. Hats off to Jay Park and the rest of the Facebook DC design team.
Perhaps the best way to understand the mechanical design is to first walk through the mechanical system diagram and then step through the actual deployment in pictures.

In the mechanical system diagram you will first note there is no process based cooling (no air conditioning) and no chilled water loop. It’s a 100% air cooled with all IT equipment cooling via outside air which is particularly effective with the favorable weather conditions experienced in central Oregon.
The outside air is pulled in from outside, mixed with a controlled volume of return air to avoid over-cooling in winter, filtered, evaporative cooled, and run through a fan wall before being returned to the cold aisles below.
Air Intake
The cooling system takes up the entire second floor of the facility with the IT equipment and office space on the first floor. In the picture below, you’ll see the building air intake on the left running the full width of the facility. This intake area in the picture is effectively “outside” so includes water drains in the floor. In the picture to the right towards the top, you’ll see the air path to the remainder of the air handling system. To the right at the bottom, you’ll see the white sheet metal sections covering the passage where hot exhaust air is brought up from the hot aisle below to be mixed with outside air to prevent over-cooling on cold days.

Mixing Plenum & Filtration:
In the next picture, you can see the next part of the full building air handling system. On the left the louvers at the top are bringing in outside air from the outside air intake room. On the left at the bottom, there are thermostatically controlled louvers allowing hot exhaust air to come up from the IT equipment hot aisle. The hot and outside air are mixed in this full room plenum before passing through the filtration wall visible on the right side of the picture.
The facility grounds are still being worked upon so the filtration system includes an extra set of disposable paper filters to increase the life of the more aggressive filtration media not visible behind.

Misting Evaporative Cooling
In the next picture below you’ll see the temperature and humidity controlled high-pressure water misting system again running the full width of the facility. Most evaporative cooling systems used in data center applications use wet media. This system used high pressure water with stainless steel nozzles to produce an atomized mist. These designs produce excellent cooling (high delta-T) but are prone to calcification and maintenance without aggressive filtration which brings some expense. But they are very effective.
Just beyond the misters, you’ll see what looks like filtration media. This media is there to ensure no airborne water makes it out of the cooling room.

Exhaust System
In the final picture below, you’ll see we have now got to the other side of the building where the exhaust fans are found. We bring air in on one side, filter, cool it, and pump it, and then just before the exhaust fans visible in the picture, huge openings in the floor alow treated, cooled air to be brough down to the IT equipment cold aisle below.
The exhaust fans visible in the picture control pressure by exhausting air not needed back outside.

The Facebook facility has considerable similarity to the design EcoCooling mechanical design I posted last Monday (Example of Efficient Mechanical System Design). Both approaches use the entire building as the air ducting system and both designs use evaporative cooling. The notable differences are: 1) EcoCooling design is based upon wet media whereas Facebook is using a high pressure water misting system, and 2) the EcoCooling design is running the mechanical systems beside the compute floor whereas the Facebook design uses the second floor above the IT rooms for all mechanical system.
The most interesting aspects of the Facebook mechanical design: 1) full building ducting with huge plenum areas, 2) no-process based cooling, 3) mist-based evaporative cooling, 4) large, efficient impellers with variable frequency drive, and 4) full wall, low-resistance filtration.

I’ve been saying for years that mechanical systems are where the largest opportunities for improvement lie and are the area where most innovation is most required. The Facebook Prineville Facility is most of the way there and one of the most efficient mechanical system designs I’ve come across. Very elegant and very efficient.
--jrh
James Hamilton
e: jrh@mvdirona.com
w: http://www.mvdirona.com
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
The pace of innovation in data center design has been rapidly accelerating over the last 5 years driven by the mega-service operators. In fact, I believe we have seen more infrastructure innovation in the last 5 years than we did in the previous 15. Most very large service operators have teams of experts focused on server design, data center power distribution and redundancy, mechanical designs, real estate acquisition, and network hardware and protocols. But, much of this advanced work is unpublished and practiced at a scale that is hard to duplicate in a research setting.
At low scale with only a data or center or two, it would be crazy to have all these full time engineers and specialist focused on infrastructural improvements and expansion. But, at high scale with 10s of data centers, it would be crazy not to invest deeply in advancing the state of the art.
Looking specifically at cloud services, the difference between an unsuccessful cloud service and a profitable, self-sustaining business is the cost of the infrastructure. With continued innovation driving down infrastructure costs, there is investment capital available, services can be added and improved, and value can be passed on to customers through price reductions. Amazon Web Services, for example, has had 11 price reductions in 4 years. I don’t recall that happening in my first 20 years working on enterprise software. It really is an exciting time in our industry.
Facebook is a big business operating at high scale and they also have elected to invest in advanced infrastructure designs. Jonathan Heiliger and the Facebook infrastructure team have hired an excellent group of engineers over the past couple of years and are now bringing these designs to life in their new Prineville Oregon facility. I had the opportunity to visit this datacenter 6 weeks back just before it started taking production load. I had an excellent visit, got to catch up with some old friends, meet some new ones, and tour an impressive facility. I saw an unusually large number of elegant designs ranging from one of the cleanest mechanical systems I’ve come across, three phase 480VAC directly to the rack, a low voltage direct current distributed uninterruptable power supply system, all the way through to custom server designs. But, what made this trip really unusual is that I’m actually able to talk about what I saw.
In fact, more than allowing me to talk about it, Facebook has decided to release most of the technical details surrounding these designs publically. In the past, I’ve seen some super interesting but top secret facilities and I’ve seen some public but not particularly advanced data centers. To my knowledge, this is the first time an industry leading design has been documented in detail and released publically.

The set of specifications Facebook is releasing are worth reading so I’m posting links to all below. I encourage you to go through these in as much detail as you chose. In addition, I’ll also post summary notes over the next couple of days explain aspects of the design I found most interesting or commenting upon the pros and cons of some of the approaches employed.
The specifications:
· Data Center Design
· Intel Motherboard
· AMD Motherboard
· Battery Cabinet (Distributed UPS)
· Server PSU
· Server Chassis and Triplet Hardware
My commendations to the specification authors Harry Li , Pierluigi Sarti, Steve Furuta, Jay Park and to the rest of the Facebook infrastructure team for releasing this work publically and for doing so in sufficient detail that others can build upon it. Well done.
--jrh
Update:
· Open Compute Web Site: http://opencompute.org/
· Live Blog of the Announcement: http://www.insidefacebook.com/2011/04/07/live-blogging-facebooks-open-compute-project/
James Hamilton
e: jrh@mvdirona.com
w: http://www.mvdirona.com
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
|