Earlier in the week, there was an EE Times posting, Server Makers get Googled, and a follow-up post from Gigaom How Google Is Influencing Server Design. I’ve long been an advocate of making industry leading server designs more available to smaller data center operators since, in aggregate, they are bigger power consumers and have more leverage as a group. The key design aspects brought up in these two articles:
· Higher data center temperatures
· 12V-only power supplies
· Two servers on a board
An early article from The Register back in October, Google Demanding Intel’s Hottest Chips sourced a ex-Google employee that clearly wasn’t involved with Google’s data center or server design teams. The details are often incorrect but the article brought up two more issues of interest:
· High temperatures processors
· Containerized data center design.
Let’s look at the each of these five issues in more detail.
Higher Data Center Temperatures: A 1.7 PUE data center is a good solid design – not even close to industry leading but better than most small scale data centers. By definition, a 1.7 PUE facility delivers 59% of total data center power draw to the IT load, the servers, networking gear, storage, etc. From Where Does the Power Go and What to do About It we know that the losses in power distribution are around 8%. By subtraction, we have 33% of all power delivered to a data center consumed by cooling. Broadly speaking, there are two big ways to address giving up 1/3 of all the power consumed by a data center in cooling. The first is to invest in more efficient mechanical systems and the second is to simply do less cooling. Essentially to run the data center hotter.
Running the data center hotter is a technique with huge upside potential and it’s good to see the industry starting to rally around this approach. In a recent story (http://www.datacenterknowledge.com/archives/2008/10/14/google-raise-your-data-center-temperature/) by Data Center Knowledge, Google recommends operating data centers at higher temperatures than the norm. “The guidance we give to data center operators is to raise the thermostat,” Google energy program manager Erik Teetzel told Data Center Knowledge. “Many data centers operate at 70 [Fahrenheit] degrees or below. We’d recommend looking at going to 80 [Fahrenheit] degrees.”
Generally, there are two limiting factors to raising DC temperatures: 1) server component failure points, and 2) the precision of temperature control. We’ll discuss the component failure point more below in “high temperature processors”. Precision of temperature control is potentially more important in that it limits how close we can safely get to the component failure point. If the data center has very accurate control, say +/-2C, then we can run within 5C and certainly within 10C of the component failure point. If there is wide variance throughout the center, say +/-20C, then much more headroom must be maintained.
Temperature management accuracy reduces risk and risk reduction allows higher data center temperatures.
12-only Power Supplies: Most server power supplies are a disaster in two dimensions: 1) incredibly inefficient at rated load, and 2) much worse at less than rated load. Server power supplies are starting to get the attention they deserve but it’s still easy to find a supply that is only 80% efficient. Good supplies run in the 90 to 95% range but customers weren’t insisting so high efficiency supplies so they weren’t being used. This is beginning to change and server vendors typically offer high efficiency supplies either by default or as an extra cost option.
As important as it is to have an efficient power supply at the server rated load, it’s VERY rare to have a server operate at anything approaching maximum rated load. Server utilizations are usually below 30% and often as poor as 10% to 15%. At these lower loads, power supply efficiency is often much lower than the quoted efficiency at full load. There are two cures to this problem: 1) flatten the power supply efficiency curves so that at low load they are much nearer to the efficiency at high load, and 2) move the peak efficiency down to the likely server operating load. The former is happening broadly. I’ve not seen anyone doing the later but it’s a simple, easy to implement concept.
Server fans, CPUs, and memory all run off the 12V power supply rails in most server designs. Direct attached storage uses both the 12V and 3.3V rails. Standardizing the supply to simply produce 12V and using high efficiency voltage regulators close to the component loads is a good design for two reasons: 1) 12v only supplies are slightly simpler and simplicity allows more effort to be invested in efficiency, and 2) bringing 12V close to the components minimizes the within-the-server power distribution losses. IBM has done exactly this with their data center optimized iDataPlex servers.
Two Servers on a Board: Increasing server density by a factor of 2 is good but, generally density is not the biggest problem in a data center (see Why Blade Servers aren’t the Answer to All Questions). I am more excited by designs that lower costs by sharing components and so this is arguably a good thing even if you don’t care all that much about server density.
I just finished some joint work with Rackable Systems focusing on maximizing work done per dollar and work done per joule on server workloads. This work shows improvements of over 3x over existing server designs on both metrics. And, as a side effect of working hard on minimizing costs, the design also happens to be very dense with 6 servers per rack unit all sharing a single power supply. This work will be published at the Conference on Innovative Data System Research this month and I’ll post it here as well.
GigaOM had an interesting post reporting that Microsoft is getting server vendors to standardize on their components: http://gigaom.com/2008/12/03/microsoft-reveals-fourth-gen-datacenter-design/. They also report that Google custom server design is beginning to influence server suppliers: http://gigaom.com/2008/12/29/how-google-is-influencing-server-design/. It’s good to see data center optimized designs beginning to be available for all customers rather than just high scale purchasers.
High Temperature Processors:
The Register’s Google Demanding Intel’s Hottest Chips? reports
When purchasing server processors directly from Intel, Google has insisted on a guarantee that the chips can operate at temperatures five degrees centigrade higher than their standard qualification, according to a former Google employee. This allowed the search giant to maintain higher temperatures within its data centers, the ex-employee says, and save millions of dollars each year in cooling costs.
Predictably Intel denies this. And logic suggests that it’s probably not 100% accurate exactly as reported. Processors are not even close to the most sensitive component in a server. Memory is less heat tolerant than processors. Disk drives are less heat tolerant than memory. Batteries are less heat tolerant than disks. In short, processors aren’t the primary limiting factor in any server design I’ve looked at. However, as argued above, raising data center temperature will yield huge gains and part of achieving these gains are better cooling designs and more heat tolerant parts.
In this case, I strongly suspect that Google has asked all its component suppliers to step up to supporting higher data center ambient temperatures but I doubt that Intel is sorting for temp resistance and giving Google special parts. As a supplier, I suspect they are signed up to “share the risk” of higher DC temps with Google but I doubt they supplying special parts.
Raising DC temperatures is 100% the right approach and I would love to see the industry cooperate to achieve 40C data center temperatures. It’ll be good for the environment and good for the pocketbook.
Also in Google Demanding Intel’s Hottest Chips? the Register talks about Google work in containerized data centers mentioning the Google Will-Power Project. Years ago there was super secret work at Google to build containerized data centers and a patent was filed. Will Whitted is the patent holder and hence the name, Will-Power. However Will reported in a San Francisco Chronicle article O Googlers, Where Art Thou? that the Google project was canceled years ago. It’s conceivable that Google has quietly continued the work but our industry is small, secrets are not held particularly well given the number of suppliers involved and this one has been quiet. I suspect Google didn’t continue with the modular designs. However, Microsoft has invested in modular data centers based upon containers in Chicago, First Containerized Data Center Announcement, and the new fourth generation design covered by Gigaom Microsoft Reveals Fourth-Gen Design Data Center Design, my posting Microsoft Generation 4 Modular Data Centers and the detailed Microsoft posting by Manos, Belady, and Costello: Our Vision for Generation 4 Modular Data Centers – One Way of Getting it Just Right.
I’ve been a strong proponent of Containerized data centers (Architecture for a Modular Data Center) so it’s good to see this progress at putting modular designs into production.
Thanks to Greg Linden for pointing these articles out to me. Greg’s blog, Geeking with Greg is one of my favorites.
Amazon Web Services