Last Thursday Facebook announced the Open Compute Project where they released pictures and specifications for their Prineville Oregon datacenter and the servers and infrastructure that will populate that facility. In my last blog, Open Compute Mechanical System Design I walked through the mechanical system in some detail. In this posting, we’ll have a closer look at the Facebook Freedom Server design.
The first thing you’ll notice when looking at the Facebook chassis design is there are only 30 servers per rack. They are challenging one of the strongest held beliefs in the industry that is density is the primary design goal and more density is good. I 100% agree with Facebook and have long argued that density is a false god. See my rant Why Blade Servers aren’t the Answer to all Questions for more on this one. Density isn’t a bad thing but paying more to get denser designs that cost more to cool is usually a mistake. This is what I’ve referred to in the past as the Blade Server Tax.
When you look closer at the Facebook design, you’ll note that the servers are more than 1 Rack Unit (RU) high but less than 2 RU. They choose a non-standard 1.5RU server pitch. The argument is that 1RU server fans are incredibly inefficient. Going with 60mm fans (fit in 1.5RU) dramatically increases their efficiency but moving further up to 2RU isn’t notably better. So, on that observation, they went with 60mm fans and a 1.5RU server pitch.
I completely agree that optimizing for density is a mistake and that 1RU fans should be avoided at all costs so, generally, I like this design point. One improvement worth considering is to move the fans out of the server chassis entirely and go with very large fans on the back of the rack. This allows a small gain in fan efficiency by going with larger still fans and allows a denser server configuration without loss of efficiency or additional cost. Density without cost is a fine thing and, in this case, I suspect 40 to 80 servers per rack could be delivered without loss of efficiency or additional cost so would be worth considering.
The next thing you’ll notice when studying the chassis above is that there is no server case. All the components are exposed for easy service and excellent air flow. And, upon more careful inspection, you’ll note that all components are snap in and can be serviced without tools. Highlights:
· 1.5 RU pitch
· 1.2 MM stamped pre-plated steel
· Neat, integrated cable management
· 4 rear mounted 60mm fans
· Tool-less design with snap plungers holding all components
· 100% front cable access
The Open Compute project supports two motherboard designs where 1 uses an Intel processors and the other uses AMD.
Note that these boards are both 12V only designs.
The power supply (PSU) is an usual design in two dimensions: 1) it is a single output voltage 12v design and 2) it’s actually two independent power supplies in a single box. Single voltage supplies are getting more common but commodity server power supplies still usually deliver 12V, 5V, and 3.3V. Even though processors and memory require somewhere between 1 and 2 volts depending upon the technology, both typically are fed by the 12V power rail through a Voltage Regulator Down (VRD) or Voltage Regulator Module (VRM). The Open Compute approach is to use deliver 12V only to the board and to produce all other required voltages via an Voltage Regulator Module on the mother board. This simplifies the power supply design somewhat and they avoid cabling by having the motherboard connecting directly to the server PSU.
The Open Compute Power Supply is has two power sources. The primary source is 277V alternating current (AC) and the backup power source is 48V direct current (DC). The output voltage from both supplies is the same 12V DC power rail that is delivered to the motherboard.
Essentially this supply is two independent PSUs with a single output rail. The choice of 277VAC is unusual with most high-scale data centers run on 208VAC. But 277 allows one power conversion stage to be avoided and is therefore more power efficient.
Most data centers have mid-voltage transformers(typically in the 13.2kv range but it can vary widely by location). This voltage is stepped down to 480V three phase power in North America and 400V 3 phase in much of the rest of the world. The 480VAC 3p power is then stepped down to 208VAC for delivery to the servers.
The trick that Facebook is employing in their datacenter power distribution system is to avoid one power conversion by not doing the 480VAC to 208VAC conversion. Instead, they exploit the fact that each phase of 480 3p power is 277VAC between the phase and neutral. This avoids a power transformation step which improves overall efficiency. The negatives of this approach are 1) commodity power supplies can’t be used (277VAC is beyond the range of commodity PSUs) and 2) the load on each of the three phases need to be balanced. Generally, this is a good design tradeoff where the increase in efficiency justifies the additional cost and complexity.
An alternative but very similar approach that I like even better is to step down mid-voltage to 400VAC 3p and then play the same phase to neutral trick described above. This technique still has the advantage of avoiding 1 layer of power transformation. What is different is the resultant phase to neutral voltage delivered to the servers is 230VAC which allows commodity power supplies to be used. The disadvantage of this design is that the mid-voltage to 400VAC 3p transformer is not in common use in North America. However this is a common transformer in other parts of the world so they are still fairly easily attainable.
Clearly, any design that avoids a power transformation stage is a substantial improvement over most current distribution systems. The ability to use commodity server power supplies unchanged makes the 400 3p to neutral trick look slightly better than the 480VAC 3p approach but all designs need to be considered in the larger context in which they operate. Since the Facebook power redundancy system requires the server PSU to accept both a primary alternating current input and a backup 48VDC input, special purpose build supplies need to be used. Since a custom PSU is needed for other reasons, going with 277VAC as the primary voltage makes perfect sense.
Overall a very efficient and elegant design that I’ve enjoyed studying. Thanks to Amir Michael of the Facebook hardware design team for the detail and pictures.