Sunday, June 14, 2009

I like Power Usage Effectiveness as a course measure of data center infrastructure efficiency. It gives us a way of speaking about the efficiency of the data center power distribution  and mechanical equipment without having to qualify the discussion on the basis of server and storage used or utilization levels, or other issues not directly related to data center design. But, there are clear problems with the PUE metric. Any single metric that attempts reduce a complex system to a single number is going to both fail to model important details and it is going to be easy to game. PUE suffers from some of both nonetheless, I find it useful.

 

In what follows, I give an overview of PUE, talk about some the issues I have with it as currently defined, and then propose some improvements in PUE measurement using a metric called tPUE.

 

What is PUE?

PUE is defined in Christian Belady’s Green Grid Data Center Power Efficiency Metrics: PUE and DCiE. It’s a simple metric and that’s part of why it’s useful and it’s the source of some of the sources of the flaws in the metric.  PUE is defined to be

 

                                PUE = Total Facility Power / IT Equipment Power

 

Total Facility Power is defined to be “power as measured at the utility meter”.  IT Equipment Power is defined as “the load associated with all of the IT equipment”. Stated simply, PUE is the ratio of the power delivered to the facility divided by the power actually delivered to the servers, storage, and networking gear. It gives us a measure of what percentage of the power actually gets to the servers with the rest being lost in the infrastructure.  These infrastructure losses include power distribution (switch gear, uninterruptable power supplies, Power Distribution Units, Remote Power Plugs, etc.) and mechanical systems (Computer Room Air Handlers/Computer Room Air Conditioners, cooling water pumps, air moving equipment outside of the servers, chillers, etc.).   The inverse of PUE is called Data Center Infrastructure Efficiency (DCiE):

 

                                DCiE = IT Equipment Power / Total Facility Power * 100%

 

So, if we have a PUE of 1.7 that’s a DCiE of 59%.  In this example, the data center infrastructure is dissipating 41% of the power and the IT equipment the remaining 59%.

 

This is useful to know in that allows us to compare different infrastructure designs and understand their relative value.  Unfortunately, where money is spent, we often see metrics games and this is no exception. Let’s look at some of the issues with PUE and then propose a partial solution.

 

Issues with PUE

Total Facility Power: The first issue is the definition of total facility power. The original Green Grid document defines total facility power as “power as measured at the utility meter”. This sounds fairly complete at first blush but its not nearly tight enough.  Many smaller facilities meter at 480VAC but some facilities meter at mid-voltage (around 13.2kVAC in North America). And a few facilities meter at high voltage (~115kVAC in North America). Still others purchase and provided the land for the 115kVAC to 13.2kVAC step down transformer layer but still meter at mid-voltage.

 

Some UPS are installed at medium voltage whereas others are at low (480VAC). Clearly the UPS has to be part of the infrastructure overhead. 

 

The implication of the above observations is that some PUE numbers include the losses on two voltage conversion layers getting down to 480VAC, some include 1 conversion, and some don’t include any of them. This muddies the water considerably and makes small facilities look somewhat better than they should and it’s an just another opportunity to inflate numbers beyond what the facility can actually produce.

 

Container Game: Many modular data centers are built upon containers that take 480VAC as input. I’ve seen modular data center suppliers that chose to call the connection to the container “IT equipment” which means the normal conversion from 480VAC to 208VAC (or sometimes even to 110VAC) is not included.  This seriously skews the metric but the negative impact is even worse on the mechanical side. The containers often have the CRAH or CRAC units in the container. This means that large parts of the mechanical infrastructure is being included under “IT load” and this makes these containers look artificially good.  Ironically, the container designs I’m referring to here actually are pretty good. They really don’t need to play metrics games but it is happening so read the fine print.

 

Infrastructure/Server Blur: Many rack based modular designs use large rack levels fans rather than multiple inefficient fans in the server. For example, the Rackable CloudRack C2 (SGI is still Rackable to me :)) moves the fans out of the servers and puts them at the rack level. This is a wonderful design that is much more efficient than tiny 1RU fans. Normally the server fans are included as “IT load” but in these modern designs that move fans out of the servers, its considered infrastructure load.

 

In extreme cases, fan power can be upwards of 100W (please don’t buy these servers). This makes a data center running more efficient servers potentially have to report a lower PUE number. We don’t want to push the industry in the wrong direction. Here’s one more.  The IT load normally includes the server Power Supply Unit (PSU) but in many designs such as IBM iDataPlex the individual PSUs are moved out of the server and placed at the rack level. Again, this is a good design and one we’re going to see a lot more of but it takes losses that were previously IT load and makes them infrastructure load. PUE doesn’t measure the right thing in these cases.

 

PUE less than 1.0: In the Green Grid document, it says that “the PUE can range from 1.0 to infinity” and goes on to say “… a PUE value approaching 1.0 would indicate 100% efficiency (i.e. all power used by IT equipment only).   In practice, this is approximately true. But PUEs better than 1.0 is absolutely possible and even a good idea.  Let’s use an example to better understand this.  I’ll use a 1.2 PUE facility in this case. Some facilities are already exceeding this PUE and there is no controversy on whether its achievable. 

 

Our example 1.2 PUE facility is dissipating 16% of the total facility power in power distribution and cooling. Some of this heat may be in transformers outside the building but we know for sure that all the servers are inside which is to say that at least 83% of the dissipated heat will be inside the shell. Let’s assume that we can recover 30% of this heat and use it for commercial gain.  For example, we might use the waste heat to warm crops and allow tomatoes or other high value crops to be grown in climates that would not normally favor them.  Or we can use the heat as part of the process to grow algae for bio-diesel.  If we can transport this low grade heat and net only 30% of the original value, we can achieve a 0.90 PUE.  That is to say if we are only 30% effective at monetizing the low-grade waste heat, we can achieve a better than 1.0 PUE.

 

Less than 1.0 PUE are possible and I would love to rally the industry around achieving a less than 1.0 PUE.  In the database world years ago, we rallied around the achieving 1,000 transactions per second.  The High Performance Transactions Systems conference was originally conceived with a goal of achiving these (at the time) incredible result.  1,000 TPS was eclipsed decades ago but HPTS remains a fantastic conference. We need to do the same with PUE and aim to get below 1.0 before 2015. A PUE less than 1.0 is hard but it can and will be done.

 

tPUE Defined

Christian Belady, the editor of the Green Grid document, is well aware of the issues I raise above.  He proposes that it be replaced long haul by the Data Center Productivity (DCP) index. DCP is defined as:

 

                                DCP = Useful Work / Total Facility Power

 

I love the approach but the challenge is defining “useful work” in a general way. How do we come up with a measure of useful work that spans all interesting workloads over all host operating systems.  Some workloads use floating point and some don’t. Some use special purpose ASICs and some run on general purpose hardware. Some software is efficient and some is very poorly written.  I think the goal is the right one but there never will be a way to measure it in a fully general way. We might be able to define DCP for a given workload type but I can’t see a way to use it to speak about infrastructure efficiency in a fully general way.

 

Instead I propose tPUE which is a modification of PUE that mitigates some of the issues above. Admittedly it is more complex than PUE but it has the advantage of equalizing different infrastructure designs and allows comparison across workload types. Using tPUE, HPC facility can compare how they are doing against commercial data processing facilities.

 

tPUE standardizes where the total facility power is to be measured from and precisely where the IT equipment starts and what portions of the load are infrastructure vs server. With tPUE we attempt to remove some of the negative incentive to the blurring of the lines between IT equipment and infrastructure. Generally, this blurring is very good thing.  1RU fans are incredibly inefficient so replacing them with large rack or container level impellers is a good thing.  Multiple central PSUs can be more efficient and so moving the PSU from the server out to the module or rack again is a good thing. We want a metric that measure the efficiency of these changes correctly. PUE, as currently designed, will actually show a negative “gain” in both examples.

 

We define as:

 

tPUE =Total Facility Power / Productive IT Equipment Power

 

This is almost identical to PUE. It’s the next level of definitions that are important.  The tPUE definition of “Total Facility Power” is fairly simple. It’s power delivered to  the medium voltage (~13.2kVAC) source prior to any UPS or power conditioning. Most big facilities are delivered at this voltage level or higher. Smaller facilities may get 480VAC delivered, in which case, this number is harder to get. We solve the problem by using a transformer manufacturer specified number if measurement is not possible.  Fortunately, the efficiency numbers for high voltage transformers are accurately specified by manufacturers. 

 

For tPUE the facility voltage must be actually measured at medium voltage if possible. If not possible, it is permissible to measure at low voltage (480VAC in North America and 400VAC in many other geographies) as long as the efficiency loss of the medium voltage transformer(s) is included. Of course, all measurements must be before UPS or any form of power conditioning. This definition permits using a non-measured, manufacturer-specified efficiency number for the medium voltage to low transformer but it does ensure that all measurements are using medium voltage as the baseline.

 

The tPUE definition of “Productive IT Equipment Power” is somewhat more complex.  PUE measure IT load as the power delivered to the IT equipment. But, high scale data centers IT equipment are breaking the rules. Some have fans inside and some use the infrastructure fans. Some have no PSU and are delivered 12VDC by the infrastructure whereas most still have some form of PSU. tPUE “charges” all fans and all power conversions to the infrastructure component.  I define “Productive IT Equipment Power” to be all power delivered to semiconductors (memory, CPU, northbridge, southbridge, NICs), disks, ASIC, FPGAs, etc. Essentially we’re moving the PSU losses, the voltage regulator down (VRD) and/or voltage regulator modules (VRM), and cooling fans from “IT load” to infrastructure.  In this definition, infrastructure losses unambiguously includes all power conversions, UPS, switch gear, and other losses in distribution. And it includes all cooling costs whether they be in the server or not.

 

This hard part is how to measure tPUE. It achieves our goals of being comparable since everyone would be using the same definitions. And doesn’t penalize innovative designs that blur the conventional lines between server and infrastructure.  I would argue we have a better metric but the challenge will be how to measure it? Will data center operators be able to measure it and track improvements in their facilities and understand how they compare with others?

 

We’ve discussed how to measure total facility power. The short summary is it must be measured prior to all UPS and power conditioning at medium voltage.  If high voltage is delivered directly to your facility, you should measure after the first step down transformer.  If your facility is delivered low voltage, then ask your power supplier whether it be the utility, the colo-facility owner, or your companies infrastructure group, the efficiency of the medium to low step down transformer at your average load. Add this value in mathematically. This is not perfect but it better than where we are right now when we look at a PUE.

 

At the low voltage end where we are delivering “productive IT equipment power” we’re also forced to use estimate with our measures.  What we want to measure is the power delivered to individual components. We want to measure the power delivered to memory, CPU, etc. Our goal is to get power after the last conversion and this is quite difficult since VRDs are often on the board near the component they are supplying.  Given that non-destructive power measurement at this level is not easy, we use an inductive ammeter on each conductor delivering power to the board. Then we get the VRD efficiencies from the system manufacturer (you should be asking for these anyway – they are an important factor in server efficiency). In this case, we often can only get efficiency at rated power and the actually efficiency of the VRD will be less in your usage.  Nonetheless, we use this single efficiency number since it at least is an approximation and more detailed data is either unavailable or very difficult to obtain. We don’t include fan power (server fans typically run on a 12 volt rail). Essentially what we are doing is taking the definition of IT Equipment load used by the PUE definition and subtracting off VRD, PSU, and fan losses.   These measurement needs to be taken at full server load.

 

The measurements above are not as precise as we might like but I argue the techniques will produce a much more accurate picture of infrastructure efficiency than the current PUE definitions and yet these metrics are both measurable and workload independent.

 

Summary:

We have defined tPUE to be:

 

tPUE =Total Facility Power / Productive IT Equipment Power

 

We defined total facility power to be measured before all UPS and power conditioning at medium voltage.  And we defined Productive IT Equipment Power to be server power not including PSU, VRD and other conversion losses nor including fan or cooling power consumption.

 

Please consider helping to evangelize tPUE and use tPUE. And, for you folks designing and building commercial servers, if you can help by measuring the Productive IT Equipment Power for one or more of your SKUs, I would love to publish your results.  If you can supply Productive IT Equipment Power measurement for one of your newer servers, I’ll publish it here with a picture of the server.

 

Let’s make the new infrastructure rallying cry achieving a tPUE<1.0.

 

                                                                --jrh

 

James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
james@amazon.com  

H:mvdirona.com | W:mvdirona.com/jrh/work  | blog:http://perspectives.mvdirona.com

 

Sunday, June 14, 2009 4:53:34 PM (Pacific Standard Time, UTC-08:00)  #    Comments [9] - Trackback
Hardware
Tracked by:
"Usenix ATC 09 Awards & Keynotes" (Tim Wood - Virtualization, Research, Gra... [Trackback]
Sunday, June 14, 2009 9:13:09 PM (Pacific Standard Time, UTC-08:00)
James:

I have a very different thought with all these optimizations. With PUE or tPUE or DCE we are trying to provide the efficiency between IT and its corresponding facilities overhead. It is good thing but in my opinion our prime focus should be someplace else. The prime optimization should be on the server compute platform. Most of the large data centers run their server platform at 10% to 15% average utilization. So the question is are we better off in improving the server utilization as the primary or improving the energy overhead? Example would we rather have 50,000 server data center with a PUE of 1.2 vs. a 25,000 servers with a PUE of 1.4?

I think we must optimize PUE but the primary focus should be on reducing IT foot print by better application design or virtualization king of technology. Google’s of the world takes lot of credit by publishing that their data center is running at a PUE of 1.2 or better. What would be even better if they publish average utilization of their ½ million servers. The industry must first optimize for server utilization then for overhead optimization. At he end of day the server optimization has a higher multiplier. :-)

Regards

Jawaid Ekram
Jawaid Ekram
Monday, June 15, 2009 6:04:58 AM (Pacific Standard Time, UTC-08:00)
Hi Jawaid. I agree that server utilization is amongst the biggest problems and that’s one of the reasons why I’m a huge believer in cloud computing. Large populations make it easier to drive much higher utilizations.

So I’m with you in priority but it would make no sense to work on utilization and not make progress on infrastructure efficiency and server design. We need to and easily can keep improving infrastructure efficiency while improving utilization.

James Hamilton
jrh@mvdirona.com
Monday, June 15, 2009 8:27:29 AM (Pacific Standard Time, UTC-08:00)
James, with the current PUE we have people all over the place and disagreeing.

With the productive work factor it will be worst as it goes directly into a core revenue generating portion that companies will not divulge. So will it be reported properly?

I like your suggestion on working over a common base unit for the utility portion, by reflecting everything back to a MV point you get some commonality.FYI: Electrical engineers running short circuit studies are more than use to that process.

I think some efforts should be place into first addressing the components that make up the PUE Total Power portion: HVAC, Losses, Miscellaneous-Lighting as it relates to a critical kW and adjust these to real estate use and water consumption (we have done work on this and HP has done work on this). They are all related and maybe they are not a big % in the overall scheme, but they are capital intensive, they impact site selection, they impact the building and impact our incoming streams and the waste

Potentially a PUE of 1.1 for a 50K data center in Phoenix that will have 10 MW of critical IT , announced today!!. Might not be realistic if there is visibility to the factors above as it will give you a good glimpse on the making.

There is a large % of the industries that is not an enterprise at the level of Microsoft, Google, Yahoo or Amazon as a business but also even if they are an enterprise is at different stages of maturity within their business model.

david
Monday, June 15, 2009 2:20:09 PM (Pacific Standard Time, UTC-08:00)
I agree David. The computing done by enterprises other the big four you mentioned clearly dominate. No question.

I'm super interested in continuing to go after power that doesn't productively serve customers. That's what's driving my interest in tPUE. I believe if you shine a light on a problem, things quickly get better.

James Hamilton
jrh@mvdirona.com
Wednesday, June 24, 2009 7:05:48 AM (Pacific Standard Time, UTC-08:00)
James,

Great comments, thanks for raising awareness of about PUE and issues. I love the idea of a "PUE Unity Conference" dedicated to lowering measured and reported power efficiencies toward the 1.0 level. Maybe The Green Grid will take up that challenge (though I'm not speaking in an official capacity here.)

While there's no doubt about PUE's shortfalls and questions, one of PUE's virtues is simplicity, both in the collection of data and the basic meaning of the calculation. Simplicity translates to lower cost, ease of use, and wider adoption. I'd argue the coarse nature of the measure doesn't reduce it's basic value as a metric that people can use to develop effective action plans.

Regarding PUE less than 1.0, there's always a choice of where to draw the boundary of the problem and whether we are observing the 2nd Law of Thermodynamics in our calculations. Drawing the energy boundary around auxiliary facilities such as crop fields, or domestic water pre-heaters, or office areas using waste data center heat for comfort heating would require us to include all of the energy inputs to those areas into the numerator of the PUE calculation, complicating the measurement and adding questionable value to the metric. Does the data center manager get credit for crop production when asked to improve data center efficiency? My preference is K.I.S.S.

Same for your proposal on tPUE, simplicity is best for the PUE-level of metric. For the supply side. the question is where to draw the boundary, and what I can affect in my data center. Counting the losses from high-to-low voltage conversions only makes sense if I can affect those losses, by building a more efficient sub-station or using a better service entrance and step down transformer. Otherwise, those losses are accounted for in someone else's energy balance and charged to me in the price I pay for my low voltage feed. To take the sublime to the ridiculous, I could extend your proposal and take efficiency all the way back to the raw fuel used in the power plant and compare efficiency of a coal-powered data center to a hydro-powered one (which we will probably get to do as carbon legislation comes on line around the world! :-)

For the productive IT power, most people don't have the ability or budget to measure power at the sub-components of their systems, and I'd argue that the inefficiencies inside the box are measurable at the plug. For example, two systems, that both use an Intel 5500 3.2GHz processor, each require 80W of power to be delivered to the CPU in order to run at full load. If one system draws 140W at the plug and the other draws 210W, what more do I need to know about the efficiency of the VRMs, fans, and power supplies? Splitting the power used inside the box into productive and non-productive energy seems overkill.

The real key to improving PUE is, as you suggest, to specify and report how, when, where, how frequently, and under what conditions the data was collected and how that data was processed after collection. The Green Grid published reporting guidelines in February (available here ) that is the beginning of this specification process, and could eventually lead to a standardized metric with enough transparency that one could reliably understand the meaning of the value published. Miles per gallon existed long before the EPA published the measurement protocol that led to efficiency window stickers on cars, and I think The Green Grid will get to the point where you and I could compare our values with reasonable confidence.

FInally (I know, long winded response...) regarding an "HPC facility can compare how they are doing against commercial data processing facilities," I don't think you would ever want to do that. An HPC facility is the Formula 1 racing machine of data centers; commercial data processing data centers are semi-trucks. One wouldn't compare the miles per gallon between these types of vehicles because of the difference in their intended use. However, comparing Formula 1 to Formula 1, or semi to semi is a valid comparison, and would have much more meaning. I don't think we need a metric that can span all types of data centers, just one that can be used within the same types. And it's OK if one metric doesn't cover every type, as long as it is useful for a large enough subset to be valuable.

Good column, sorry for the long response, and again, this is not an official Green Grid response by any means. Keep up the good work!

mark.
mark monroe
Thursday, June 25, 2009 9:30:14 AM (Pacific Standard Time, UTC-08:00)
Great discussion. Thanks for starting it.

I must agree with Mark Monroe's argument that simplicity wins over purity when it comes to making effective changes in the industry.

I think we all agree that the goals are 1) build and buy more efficient IT equipment, and 2) build and buy more efficient power and cooling facilities to run it. But since these components tend to come from different vendors and are often installed and owned by different groups within a company, drawing the line at this boundary, and optimizing the two sides independently, is quite appealing.

It should be relatively straight forward to define performance-to-power metrics for individual pieces of IT equipment. SPECpower_ssj2208 and SPECweb2009 are great examples of such tools for servers. Realizing that different benchmarks are appropriate for different applications, such tools allow the user to compare the power-efficiency of one piece of IT equipment to another, before he makes his purchasing decisions. In most cases, this would be all “the light” we need to shine on the problem to incent IT vendors to make it better.

Things get a little more complicated when the IT equipment shares some common elements, e.g., pooled power supplies or fans, as in blade systems today. But again the question is, where is the ownership boundary? Since the blade enclosure comes from a single vendor, drawing the boundary around his equipment give him the incentive to produce the best performance-to-power ratio for that enclosure. The argument extends to the rack or larger, as long as the IT equipment comes from the same vendor. The recently released SPECpower_ssj2008 v1.1 includes multinode measurements specifically to address such issues.

PUE would then be the metric of choice for comparing one power and cooling system to another. True, apples will be compared to oranges, if we don't all meter at the same voltage, but again as Mark pointed out, we maintain simplicity, by simply documenting the measurement point.

Ironically, there are changes that can be made to the facility that help one side of this boundary but hurt the other. For example, raising the temperature of the cold aisle can lower cooling energy consumption, but raises fan speeds inside the IT equipment. But even with this flaw, the simplicity of measuring at the IT power connection and optimizing each side (mostly) independently is compelling.
Alan Goodrum, HP Fellow
Saturday, June 27, 2009 6:40:11 AM (Pacific Standard Time, UTC-08:00)
Thanks for the comments Mark. In your comment you argued that PUE is a better choice in that its simple. Generally, I agree that simple is good and simplicity is one of the reasons I really like PUE. What I don’t like is published numbers that can’t be accurately compared and changes that are good for the environment looking worse from a PUE perspective. I’ll give an example of each.

In your comment you said “Counting the losses from high-to-low voltage conversions only makes sense if I can affect those losses, by building a more efficient sub-station or using a better service entrance and step down transformer. Otherwise, those losses are accounted for in someone else's energy balance and charged to me in the price I pay for my low voltage feed.” I understand the perspective and, if PUE is only used to measure local improvements, then this approach would be fine. However, the power of PUE is understanding where you compare with the industry and whether your change was as effective as a different change made by someone else. PUE invariably gets used to report to management and stock holders how efficient a facility is. Not including some conversions ends up being incorrect or dishonest.

If my PUE is better than yours, should you try to understand my design better and try to apply it to your center? Or is it just a PUE game where I’ve not included transformations that I don’t control. My PUE is better but your facility may be more efficient. If I’m at 1.5 is it good or not? Without their being a fixed definition of PUE, the numbers don’t tell us much without considerable detail on how measured and the results can’t be compared.

You asked “I'd argue that the inefficiencies inside the box are measurable at the plug. For example, two systems, that both use an Intel 5500 3.2GHz processor, each require 80W of power to be delivered to the CPU in order to run at full load. If one system draws 140W at the plug and the other draws 210W, what more do I need to know about the efficiency of the VRMs, fans, and power supplies?” This is an example of PUE leading us in the wrong direction. If one server contains 50W of fans and the other uses rack level fans, then we see an unfortunate outcome. The fan-in-server design is typically less efficient. But the fan-in-server design will allow a higher PUE. Moving from small 1RU fans to rack or module level air movers is a big step forward in efficiency but will cause the PUE to go up as the fans are take from IT load to infrastructure load. You do need to know what is in the server.

This last paragraph above is potentially the biggest concern for me. When a design change that increases efficiency and lowers overall costs makes the PUE metric get worse, we have a problem. I like approximations and I like simplicity but, when the metric is swinging in the wrong direction, it’s not helping us.

James Hamilton
jrh@mvdirona.com
Saturday, June 27, 2009 6:54:00 AM (Pacific Standard Time, UTC-08:00)
Alan, like Mark, you argue for the simplicity of PUE over tPUE. I hear you. You recognize that there line between server and infrastructure is blurring but argue since the larger modules are from a single vendor we can treat them as a N server unit. You are using the “ownership boundary” as the defining line.

Specifically, Alan said “Things get a little more complicated when the IT equipment shares some common elements, e.g., pooled power supplies or fans, as in blade systems today. But again the question is, where is the ownership boundary? Since the blade enclosure comes from a single vendor, drawing the boundary around his equipment give him the incentive to produce the best performance-to-power ratio for that enclosure. The argument extends to the rack or larger, as long as the IT equipment comes from the same vendor. The recently released SPECpower_ssj2008 v1.1 includes multinode measurements specifically to address such issues.”

This becomes a problem when large modules are treated as a “server”. I agree that treating a multi-blade module as IT equipment is reasonable. Treating a full rack with distributed UPS as IT equipment is pushing it though. At that point we have UPS being treated as IT load. Rackable and Verrari both sell racks that take 480V and include all conversions and UPS in rack. Treating these as IT load and comparing to a central UPS design on the basis of UPS will tell you nothing.

Containers are being treated by the industry as IT equipment and I’m seeing sub-1.15 PUEs. What’s going on is the UPS, all voltage below 480VAC, and a big part of the mechanical systems are being treated as IT load. Since some companies “don’t have control above the 480VAC level and some racks and modules take 480VAC, we end up with some PUE numbers that include no power conversions while others include 4 or more.

I like simplicity but, if we don’t define what PUE means, then its really hard to use the PUE numbers to drive change and share efficiency results across the industry. A 1.2 PUE facility might be a lower efficiency design than a 1.35 PUE facility. We need to define numbers to use them effectively. Mark’s argument that the definition of PUE should be “what I control” doesn’t sound like the best approach.

I want numbers where we can compare them and learn how different designs actually work. Without a definition of PUE that stays constant through design changes, we can’ t do this.

James Hamilton
jrh@mvdirona.com
Sunday, July 05, 2009 7:40:40 AM (Pacific Standard Time, UTC-08:00)
I think you have it exactly right, James. I want a metric that incentivizes the right behavior without driving the metric in the wrong direction. (Such things become very hard to explain to executive management!)

I think with the EPA's assistance (Energy Star) we can "shine a light" on these additional losses inside IT equipment. I'd like to see this data become part of a standard label for all IT equipment, so the consumer can buy with all the facts in fron of them. As corporate architects (in my case) and using our corporate purchasing power, we can all help the cause by asking our vendors to give us this data as part of our sourcing efforts.

(For completeness, I assume you still charge any onboard UPS to the infrastructure side of the equation - e.g. Google's design.)

What interests me also is the variance in tPUE as a function of (1) outside air temperature (enthalpy actually), (2) IT load and (3) cold aisle temperature. tPUE versus (1) and (2) makes a 3D surface chart that shows the efficiency response of the "integrated system" (facility + IT) to these key factors. For example - how badly does a facility fall away from its optimal tPUE when unloaded on the IT side or when it gets hot and sticky outside? Our goal should be to try to expand the optimal range as much as possible (think Tabletop Mountain versus Mont Blanc).

To make these charts comparable between facilities in a corporation (or outside), we need to normalize the data and plot on a percentage scale. For IT load, this is simply 0%-100% of the maximum IT load possible from a facility (or room). For outside enthalpy you have to be a bit more creative, but I'm thinking plotting average enthalpy and say 3 standard deviations each side onto a 0-100% scale. Of course, comparing a DC in say Dallas, TX to one in Keflavik, Iceland is pretty unfair (i.e. Keflavik's climate is pretty much perfect for DC's), but having this 0-100% load, 0-100% enthalpy plot of tPUE would really allow high quality discussions between facilities in a portfolio.

Variance of tPUE by cold aisle temperature is my current focus. In the first level optimization, you want to pick a temperature that optimizes tPUE for the average enthalpy and load. Simply put, 80F allows you to unburden the facility AC without too much corresponding fan HP increase on the servers. Maybe (not sure?) 90F outweighs the facility AC saves with the fan HP. In the second level of optimization, thinking about cold aisle temperature as a function of load and enthalpy is interesting. Should you run your facility warmer in the summer and colder in the winter, for example?

Anyway - keep up the good work. Here's to our first sub-unity tPUE conference.

Andrew (chief architect at a bank)
Andrew Stokes
Comments are closed.

Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.

Archive
<August 2014>
SunMonTueWedThuFriSat
272829303112
3456789
10111213141516
17181920212223
24252627282930
31123456

Categories
This Blog
Member Login
All Content © 2014, James Hamilton
Theme created by Christoph De Baene / Modified 2007.10.28 by James Hamilton