This note describes a conversation I’ve had multiple times with data center owners and concludes that blade servers frequently don’t help and they sometimes hurt, easy data center power utilization improvements are available independent of the blade server premium, and enterprise data center owners have a tendency to buy gadgets from the big suppliers rather than think through overall data center design. We’ll dig into each.
In talking to data center owners, I’ve learned a lot but every once in a while I come across a point that just doesn’t make sense. My favorite example is server density. I’ve talked to many DC owners (and I’ll bet I’ll hear from many after this note) that have just purchased blades servers. The direction of conversation is always the same. “We just went with blades and now have 25+kW racks”. I ask if their data center has open floor and it almost always does. We’ll come back to that. Hmmm, I’m thinking. They now have much higher power density racks at higher purchase cost in order to get more computing per square foot but the data center already has open floor space (since almost all well designed centers are power and cooling bound rather than floor space bound). Why?
Earlier, we observed that most well designed data centers are power and cooling bound rather than space bound. Why is that anyway? There is actually very little choice. Here’s the math: Power and Cooling make up roughly 70% of the cost of the data center while the shell (the building) is just over 10%. As a designer, you need to design a data center to lasts for 15 years. Who has a clue of the needed power density (usually expressed in W/sq ft) 15 years from today? It depends upon the server technology, the storage ratio, and many other factors. The only thing we know for sure is we don’t know and almost any choice will inevitably be wrong. So a designer is going to have too much power and cooling or too much floor space. One or the other will be wasted no matter what. Wasting floor space is a 10% mistake whereas stranding power and cooling is a 70% mistake. This 10% number applies to large scale data centers of over 10MW not in the center of New York – we’ll come back to that. Any designer that strands power and cooling by running out of floor space should have been fired years ago. Most avoid this by providing more floor space than needed in any reasonable usage and that’s why most data centers have vast open spaces. Its insurance against the expensive mistake of stranding power.
There are rare exceptions to this rule of well designed data centers being power and cooling rather than floor space limited. But the common case is that a DC owner just paid the blade server premium to get yet again more unused data center floor space. They were power and cooling limited before and now, with the addition of higher density servers, even more so. No gain visible yet so the conversation then swings over to efficiency. When talking about the amazing efficiency of the new racks, we usually talk about PUE. PUE is Power Usage Effectiveness and it’s actually simpler than it sounds. It’s the total power that comes into the data center divided by the power delivered to the critical load (the servers themselves). As an example, a PUE of 1.7 means that for every watt delivered to the load 0.7 W is lost in power distribution and cooling. Some data centers, especially those that have accreted over time rather than having been designed as a whole, can be as bad as 3.0 but achieving numbers this bad takes work and focus so we’ll stick with the 1.7 example as a baseline.
So, in this conversation about the efficiency of blade servers, we hear the PUE improved PUE from 1.7 to 1.4. Sounds like a fantastic deal and, if true, that kind of efficiency gain will more than pay the blade premium and is also good for society. That would be good news all around but let’s dig deeper. I first congratulate them on the excellent PUE and ask if they had data center cooling problems when the new blade racks were first installed. Usually they experienced exactly that and eventually bought water cooled racks from APC, Rittal, or others. Some purchased blade racks with back-of-rack water cooling like the nicely designed IBM iDataPlex. But the story is always the same: they purchased blade servers and, at the same time, moved to water cooling at the rack. New generation servers can be more efficient than the previous generation and better cooling designs are more efficient whether or not blade servers are part of the equation. Turning the servers over onto their sides didn’t make them more efficient.
They key part of that PUE improvement above is they replaced the inefficiency of conventional data center cooling with water at the racks. Here’s an example of a medium to large scale deployment that went with blades and water cooled racks: One Datacenter to Rule Them All. There is nothing magical about water at the rack cooling designs. Many other approach yield similar or even better efficiency. The important factor is that they used something other than the most common data center cooling system design which is amazingly inefficient as deployed in most centers. Conventional data centers typically move air from a water cooled CRAC unit through a narrow raised floor choked with cabling. The air comes up into the cold aisle through perforated tiles. In some aisles there are too many perforated tiles and in others too few. Sometimes someone on the ops staff has put a perforated tile into the hot aisle to “cool things down” or to make it more habitable. This innocent decision unfortunately reduces cooling efficiency greatly. The cool air that comes up into the cold aisle is pulled through the servers to cool them but some spills over the top of the rack and some around the ends. Some goes through open rack positions without blanking panels. All these flows not going through the servers reduces cooling system efficiency. After flowing through the servers, the air rises to the ceiling and returns to the CRAC. Moving air that distance with so many paths that don’t go through the servers, is inefficient. If you move the water directly to the rack in what I call a CRAC-at-the-Rack design, the overall cooling design can be made much more efficient mostly through the avoidance of all these not-through-the-server air paths and avoiding the expense of pumping air long distances. It’s mostly not the blades that are more efficient, it’s the cooling systems redesign required as a side effect of deploying the high power density servers.
Rather than moving to blades and paying the blade premium, just changing the cooling system design to avoid the problems in the previous paragraph will yield big efficiency improvements.
Why are some data centers in expensive locations? Sometimes for good reason in that the communications latency to low cost real estate is too high for a very small number of applications. But, for most data centers, having them in expensive locations is simply a design mistake. Many time it’s to allow easy access to the data center but you shouldn’t need to be in data center frequently. In fact, if people are in the DC frequently, you are almost assured to have mistakes and outages. Placing DCs in hard to get to locations substantially reduces costs and improves reliability. For those few that need to have them located in New York, Tokyo, London, etc., there aren’t very many of you and you all know who you are. The remainder are spending too much. Remember my first law of data centers: if you have a windows to see in, you are almost certainly paying too much for servers, network gear, etc. Keep it cheap and ugly.
What about data centers that are out of cooling capacity but can’t use all their power or floor space. It’s bad design to strand power and simply shouldn’t happen. We know that for every watt we bring into the building we need to get it back out again. It has got to go somewhere. If the cooling system isn’t designed to dissipate the power being brought into the building, it’s bad design.
Now a more common cooling system problem is someone brought a 30kW rack into the data center and an otherwise fine cooling system that is appropriately sized overall, can’t manage that hot spot. This isn’t bad data center design but it does raise a question: why is a 30kW rack a good idea? We’re now back to asking “why” on the blade server question. Generally, unless you are getting value for extreme high power density, don’t buy it. High power density drives more expensive cooling. Unless you are getting measurable value from the increased density, don’t pay for it.
Summary so far: Blade servers allow for very high power density but they cost more than commodity, low power density servers. Why buy blades? They save space and there are legitimate reasons to locate data centers where the floor space is expensive. For those, more density is good. However, very few data center owners with expensive locations are able to credibly explain why all their servers NEED to be there. Many data centers are in poorly chosen locations driven by excessively manual procedures and the human need to see and touch that for which you paid over 100 million dollars. Put your servers where humans don’t want to be. Don’t worry, attrition won’t go up. Servers really don’t care about life style, how good the schools are, and related quality of life issues.
We’ve talked about increased efficiency possible with blades by bringing water cooling directly to the rack but this really has nothing to do with blades. Any DC designer can employ this technique or a myriad of other mechanical designs and substantially improve their data centers cooling efficiency. For those choosing modular data centers like the Rackable Ice Cube, you get the efficiency of water at the rack it as a side effect of the design. See Architecture for Modula Data Centers for more on container-based approaches and First Containerized Data Center Announced for information on the Microsoft modular DC deployment in Chicago.
We’ve talked about the high heat density of blade servers and argued that increased heat density increases operational or capital cooling expense and usually both. Generally, don’t buy increased density unless there is a tangible gain from it that actually offsets the cooling cost penalty. Basically, do the math. And then check it. And then make sure that there isn’t some cheaper way to get the same gain.
There are many good reasons to want higher density racks. One good one is that you are using very high speed, low latency communications between servers in the cluster – I know of examples of this from the HPC world but I’ve not found them in many commercial data centers. Another reason to go dense is the value of floor space is high. We’ve argued above that a very small number of centers need to be located in expensive locations due to wide-area communications delays but, again, these are rare. The vast majority of folks buying high density, blade servers aren’t able to articulate why they are buying them in a way that stands up to scrutiny. In these usage patterns, blades are not the best price/performing solutions. In fact, that’s why the world’s largest data center operator, Google, doesn’t use blade servers. When you are deploying 10’s of thousands of servers a month, all that matters is work done per dollar. And, at today price points, blade servers do not yet make sense for these high scale, high efficiency deployments.
I’m not saying that there aren’t good reason to buy high density server designs. I’ve seen many. What I’m arguing is that many folks that purchase blades, don’t need them. The arguments explaining the higher value often don’t stand scrutiny. Many experience cooling problems after purchasing blade racks. Some experience increased cooling efficiency but, upon digging more deeply, you’ll see they made cooling system design changes to increase cooling system efficiency after installation but these excellent design changes could have been deployed without paying the blade premium. In short, many data center purchases don’t really get the “work done per dollar” scrutiny that they should get.
Density is fine but don’t pay a premium for it unless there is a measurable gain and make sure that the gain can’t be achieved by cheaper means.
James Hamilton, Data Center Futures
Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com
H:mvdirona.com | W:research.microsoft.com/~jamesrh | blog:http://perspectives.mvdirona.com
Your power distribution design mentioned above sounds interesting. One level of conversion from high voltage to load would be excellent. Heck, I would settle for two :-). Can you send me a diagram or quick sketch of the approach you have been using?
Excellent analysis. I’ve watched the lemming-like march of server density ("Over the top, fellas!") and wondered why most folks fail to grasp the full picture.
I’m struggling to get my customers to overcome similar blindness on the power side. By simply rearranging the key elements, UPS equipment can be utilized in a way that reduces power system conversion losses by 50%. Basically it involves putting the UPS on the Medium-Voltage side and only having one conversion step between the incoming utility and the servers. This type of system is actually well proven in Europe and Asia, but seems like "Man-From-Mars" stuff to most data center guys in the USA.
Keep up the good work.