Tuesday, April 28, 2009

Earlier this week I got a thought provoking comment from Rick Cockrell in response to the posting: 32C (90F) in the Data Center. I found the points raised interesting and worthy of more general discussion so I pulled the thread out from the comments into a separate blog entry. Rick posted:

 

Guys, to be honest I am in the HVAC industry. Now, what the Intel study told us is that yes this way of cooling could cut energy use, but what is also said is that there was more than a 100% increase in server component failure in 8 months (2.45% to 4.46%) over the control study with cooling... Now with that said if anybody has been watching the news lateley or Wall-e, we know that e-waste is overwhlming most third world nations that we ship to and even Arizona. Think?

I see all kinds of competitions for energy efficiency, there should be a challenge to create sustainable data center. You see data centers use over 61 billion kWh annually (EPA and DOE), more than 120 billion gallons of water at the power plant (NREL), more than 60 billion gallons of water onsite (BAC) while producing more than 200,000 tons of e-waste annually (EPA). So for this to be a fair game we can't just look at the efficiency. It's SUSTAINABILITY!

It would be easy to just remove the mechanical cooling (I.E. Intel) and run the facility hotter, but the e-waste goes up by more than 100% (Intel Report and Fujitsu hard drive testing), It would be easy to not use water cooled equipment, to reduce water onsite use but the water at the power plant level goes up, as well as the energy use. The total solution has to be a solution of providing the perfect environment, the proper temperatures, while reducing e-waste.

People really need to do more thinking and less talking. There is a solution out there that can do almost everything that needs to be done for the industry. You just have to look! Or maybe call me I'll show you.

 

Rick, you commented that “it’s time to do more thinking and less talking” and argued that the additional server failures seen in the Intel report created 100% more ewaste so simply wouldn’t make sense. I’m willing to do some thinking with you on this one.

 

I see two potential issues with your assumption.  The first that the Intel report showed “100% more ewaste”. What they saw in a 8 rack test is server mortality rate of 4.46% whereas their standard data centers were 3.83%. This is far from double and with only 8 racks may not be statistically significant. Further evidence that the difference may not be significant we see that the control experiment where they had 8 racks in the other half of the container running on DX cooling showed failure rates of 2.45%.  It may be noise given that the control differed from the standard data center by about as much as test data set. And, it’s a small sample.

 

Let’s assume for a second that the increase in failure rates actually was significant. Neither the investigators or I are convinced this is the case but let’s make the assumption and see where it takes us.  They have 0.63% more than their normal data centers and 2.01% more than the control.  Let’s take the 2% number and think it through assuming these are annualized numbers. The most important observation I’ll make is that 85% to 90% of servers are replaced BEFORE they fail which is to say that obsolescence is the leading cause of server replacement. They no longer are power efficient and get replaced after 3 to 5 years.  If I could save 10% of the overall data center capital expense and 25%+ of the operating expense at the cost of having an additional 2% in server failures each year. Absolutely yes.  Further driving this answer home, Dell, Rackable, and ZT Systems will replace early failures if run under 35C (95F) on warranty.

 

So, the increased server mortality rate is actually free during the warranty period but let’s ignore that and focus on what’s better for the environment.  If 2% of the servers need repair early and I spend the carbon footprint to buy replacement parts but saving 25%+ of my overall data center power consumption, is that a gain for the environment?  I’ve not got a great way to estimate true carbon footprint of repair parts but it sure looks like a clear win to me.

 

On the basis of the small increase in server mortality weighed against the capital and operating expense savings, running hotter looks like a clear win to me. I suspect we’ll see at least a 10F average rise over the next 5 years and I’ll be looking for ways to make that number bigger. I’m arguing it’s a substantial expense reduction and great for the environment.

 

                                                                --jrh

 

James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
james@amazon.com  

H:mvdirona.com | W:mvdirona.com/jrh/work  | blog:http://perspectives.mvdirona.com

 

Tuesday, April 28, 2009 8:01:18 AM (Pacific Standard Time, UTC-08:00)  #    Comments [20] - Trackback
Hardware
Tuesday, April 28, 2009 9:35:15 AM (Pacific Standard Time, UTC-08:00)
So, as the temp rises what does this do to the efficiency of the power supplies in these servers? I know the loss is minuscule compared to the savings in AC reduction, but I'm wondering if it would then make it more cost effective to move to a DC power on servers instead of the traditional AC/DC converting power supplies we now use. Anything I can to reduce power consumption, and the cost associated with it, would be a good thing!
Tuesday, April 28, 2009 9:43:16 AM (Pacific Standard Time, UTC-08:00)
James, I don't think the point about Dell/Rackable/ZT replacing equipment really addresses the sustainability issue. If servers fail and need to be replaced, that means more total energy/resource/environmental cost to replace them regardless of who foots that bill. It's "free" from the end-user standpoint, but that's not what sustainability is about. It just moves the cost from the end user to the vendor, who will either push it back in the form of increased prices or push it even further down the chain. The real question is not whether increased data-center temperatures will minimize dollar cost to one party in a world where the upgrade cycle is a certain way because equipment is built a certain way because of other distorted economic/environmental tradeoffs. It's whether those increased temperatures will minimize all relevant costs *all the way down the chain* in a world where equipment is built with energy/resource/environmental costs in mind (e.g. more modularly so that components can be upgraded instead of whole systems). What if the upgrade treadmill slowed? What would that do to "we would have upgraded anyway" justifications?

Note that I'm not saying you're wrong about higher-temperature operations. It might still be a win, just maybe not as much or not for the reasons that you give.
Tuesday, April 28, 2009 11:32:15 AM (Pacific Standard Time, UTC-08:00)
The total energy saving in economizer compartment was reduced from 111.78 KW to 28.6 KW, representing a 74 percent reduction in energy consumption. As the Intel report pointed out at $0.08 per KHW the annual savings would be $143K. On the other hand if we assume the worst case of additional sever failure (and I agree we should not argue about who pays for the cost) the cost would be 9 servers at $2K to $3K each for a total cost of $14K to $21K. In large data centers there would additional water savings.

I agree with James and Intel report on running hotter data centers in the right thing. It would take more study and validation but on the surface it looks to be the right direction.

I also think that running hotter data center with economizer kind of technology is more feasible at mini data centers and not yet applicable at Mega Data Centers.

Jawaid
Jawaid Ekram
Tuesday, April 28, 2009 2:35:15 PM (Pacific Standard Time, UTC-08:00)
I'll add my $.02 worth. We've been running two racks with ~20kw load for nine months at temperatures in the 30C-35C range on straight utility power with no filtering and have not experienced a higher failure rate than the same model and config of servers in a traditional colo at 22C on a dual conversion UPS. In fact, the only real issue I've noticed is that Dell 2950s don't seem to like 35C and the chassis fans spin up to an incredibly high and noisy speed. I expect this behavior from enterprise servers but not scale out offerings from Dell DCS, Rackable, or HP SCI.
Tuesday, April 28, 2009 3:05:14 PM (Pacific Standard Time, UTC-08:00)
Wes, you were asking what happens to power supply efficiency at high temperature. Across the full server we see increased power consumption at higher temperatures driven by two factors: 1) semi-conductor leakages goes up with temperature, and 2) fan speed and power dissipation goes up with temperatures. I’ve been warned by so many folks on semi-conductor leakage, we measured it. It’s there and measurable but is small enough that I’m confident it will be drowned by the savings. The fan speed issue can become significant and the power dissipation can be quite significant. Some server cooling designs depend heavily on air motion and some less so. To get full value from running at higher temperatures we need servers that having good mechanical designs and don’t just solve the problem with more air flow. My recommendation is to compete servers by measure work done per joule at your planned operating temperature – there is considerable variation.

Jeff, I hear you and agree on warrantee coverage not eliminating the ewaste issue. Let me rephrase my position on that one to clarify: most servers get discarded well before they fail. In three to 5 years, they all need recycling. If we assume that their actually is 2% higher server mortality (not clear but reasonable), then we are talking about the 2% compounded annually over 3 to 5 years being the difference between running at current temps and running at 35C. I’m arguing that not buying the process based air conditioning system and the general power reduction spent on cooling will dominate the waste due to increased server failures. This was the part in my original answer where I said that I don’t have a precise carbon cost for the failed parts but with the increased mortality only in the 2% range, it looks pretty plausible that higher temp will not produce a greater carbon footprint due to greater ewaste.

And, if you are worried about cost rather than ewaste, many manufacturers will cover failures under warranty.

Jawaid, good hearing from you. You conclude that the savings opportunities are more applicable to small centers rather than mega-data centers. I agree that higher temp is higher risk and requires more care. This might make it harder to accept across a very large data center – I’ll buy that – but the potential savings appear to apply to both.

Thanks for passing your data point along TinkThank. Your observation on fan speed is bang on. We’ll want to work with server suppliers to produce servers that run with minimum fan speed and cost at operating temperature. If purchasing decisions are made on this basis, we’ll see quick improvement.

--jrh
Wednesday, April 29, 2009 7:55:36 AM (Pacific Standard Time, UTC-08:00)
Ladies and Gentlemen, to start it is apparent that I broke the first law of commenting on some once else’s blog, never insult the host. Sorry, this was purely a mistake on my part and to be honest it wasn’t intentional.

I only wrote this so that everyone would give thought about the sustainability of a data center in today’s environment. It worked, let’s review.

This was a “scientific” study generated by Intel with two (container type) data centers that were exactly the same with airflow and load in an 8 month study. What was different, one data center was cooled with Air Side Economizing (ASE) and the other was cooled by traditional means (contained environment). For this to be a true scientific study, we have two controls comparing the failure rate of the servers in those environments. Intel did this study to evaluate the effect of ASE on the servers to make sure the servers would make it through the warranty period before failing. They did not do this study to evaluate the energy use, as that was evaluated on paper through engineering calculations, before the project even was installed. Also, Intel and every other major manufacture already performs major case studies on the failure rates of servers in harsh temperature environments with test chambers that heat and cool operating servers to the point of failure. This is how they originally devised a servers operating temperature range. Yes, I’ve designed and built a few chambers in my career. The two factors a chamber cannot evaluate are dust build-up (similar to the dust in your home computer) in the server over time and the effect of air side contaminants such as SO2, NH3, ammonium nitrates, sulfuric acids, and other common compounds found in both rural and metropolitan areas. Yes you can filter out up to 99% of the contaminants, but every layer of filtration cost money in both consumables and energy. (which I will go into later)

Back to the study, we have three figures to evaluate; 3.83% the failure rate of the existing data center servers, 4.46% the failure rate of the servers in the ASE cooled control and 2.45% the failure rate of the servers with traditional cooling in the control. If we compare the 4.46% to the 2.45% that’s a 182% increase in failure between the two case studies. This increase in failure rate cannot be contributed to any one thing but in my years of dealing with test chambers I can tell you the fluctuation in temperature and humidity where probably to blame, as the dust and contaminants had hardly had time to contribute to the problem. If the study had time to progress to allow the dust to accumulate and had Intel taken these containers into differing cities where air side contaminates vary, the failure rate would have increased dramatically. It would also continue to increase over time. As far as the 3.83%, this is typically an average and if you looked into the environments of that specific data center you would see that servers failures occur in area’s which have had historically bad air flow and hot / cold spots. You really can’t compare the three numbers, as there is no control with existing data centers, the environments are dramatically different. Electronics hate expansion and contraction, as it increases the failure rate.

Energy use, Water Side Economizing (WSE) vs. Air Side Economizing (ASE). To look at another solution widely being adopted in the industry (due to the ability to economize without ASE) WSE (when designed properly) provides more hours of economizing and less total system energy than ASE. WSE reduces the overall waste produced by a data center in the form of consumables (carbon, bag and pre filters) than fill up land fills and e-waste due to many contributing factors. There is no comparison on the sustainability of WSE vs. ASE. WSE wins hand down, LBNL has commented on this also. There are also other technologies (for which I have one) that are penetrating the market that are even more efficient and cost effective for retrofit than WSE. You see we are talking about more than 61 billion kW of energy being used in data centers for which 50% of that is cooling. That needs to be reduced, the EPA expect this number to double by 2011. We can intentionally create a worse situation with e-waste knowing all the facts.

Now for the argument that we are throwing away the servers in 3-5 years anyways. OK, now that’s really not sustainable. Technology dictates that we should be getting caught up and any sales should be dictated by growth, not failure or obsolesces. Our computers are fast enough, in fact there is a major push for utilization and power management that will slow down the current and future crops of server response times. Server technology isn’t to the point where the efficiency of the new servers is dictating a cost effective change to overcome speed vs efficiency and won’t anytime soon. Manufactures need to concentrate on making a server that lasts 10 Years and that is energy efficient for the massive growth in the use of the WWW. Now that’s sustainable.

I’m glad you allowed me to write freely without prejudice; this is a very important topic that we are only scratching the surface on. We truly have and should create solutions that are sustainable in every industry not just data centers.

Please address all hate mail to rcockrell@bellproducts.com
Wednesday, April 29, 2009 8:51:22 AM (Pacific Standard Time, UTC-08:00)
Wow, this did get people thinking. I didn't comment on temperature, but everybody else did. The typical inlet air temp for servers ranges from 57F to 68F. As has been throughly explained (by posters other than myself) this is the perfect range for server sustainability. The higher the temperature the harder the equipment works and the energy use in the servers increases. What is also most important is that the temperatures and humidity levels stay very consistant for sustainability. +-2F and 5%. We keep our enviroemtns at 57F for open facility (no hot isle cold isle) or 68F for cold isle containment and achieve PUE's (accounting for typical UPS, PDU and Light Losses) from 1.29 for 57F and 1.19 for 68F in most retrofits with a very consistant inlet computer temp of +- 2F and 5%RH while maintaining a 45%RH on the inlet air... Sustainability is the name of the game.
Wednesday, April 29, 2009 10:37:16 AM (Pacific Standard Time, UTC-08:00)
“Throwing out” is a bit harsh Rick. Old servers are recycled.

You argued that this “has to stop” and said:

Now for the argument that we are throwing away the servers in 3-5 years anyways. OK, now that’s really not sustainable. Technology dictates that we should be getting caught up and any sales should be dictated by growth, not failure or obsolesces. Our computers are fast enough, in fact there is a major push for utilization and power management that will slow down the current and future crops of server response times. Server technology isn’t to the point where the efficiency of the new servers is dictating a cost effective change to overcome speed vs efficiency and won’t anytime soon.

That is not correct. A 7 year old server can be replaced and get more work done for a given amount of energy. I’ve seen 5,600 servers from a single deployment recycled because the new servers were so much more energy efficient. I agree that performance improvements are not the important metrics and what we should be measuring is work done by dollar and work done per watt. I agree we have to care about ewaste. I totally disagree that running 10 year old servers is good for the environment.

You recommend 57F for open environments and 68F in hot aisle/cold aisle deployments. And controlling humidity +/-2%. I don’t understand why you would ever recommend an open environment. Controlling humidity to +/-2% is power intensive. Not worth doing unless there is evidence that it is cost effective. I work with Rackable, Dell, HP, IBM and have seen data from many large deployments and there is no evidence that maintaining humidity at +/-2% will lead to longer equipment longevity. We know for sure that it’ll consume considerable energy to maintain these tight controls on humidity. If the equipment producers don’t recommend it and it costs more energy, I wouldn’t do it. The reason why the very large operators don’t do this is it simply doesn’t make economic sense and there is no evidence that it makes environmental sense.

Operating at 68F is very expensive. To make the argument that this is better for the environment, you need to show the data that supports the claim that equipment lasts longer at this level and the environmental impact is positive. The Google FAST disk drive study produced data that suggested the opposite may be true. None of the large service providers are choosing these set points.

If you have the data to produce a credible argument that this is more sustainable or lower cost, I would love to see it. It’s not the direction we’re currently going.

--jrh
James Hamilton, jrh@mvdirona.com
Wednesday, April 29, 2009 12:16:33 PM (Pacific Standard Time, UTC-08:00)
Leaps in performance per watt and power efficiency of new server designs means throwing out servers in as little as 24 months can actually save organizations 10-20% or more in cost per month.

An example of this is the new Intel 5500 Xeon (Nehalem) systems. Due to changes in memory architecture as well as CPU architecture, a new Nehalem system can do 50% more work per watt than a previous generation Harpertown 5400 Xeon system. Right now the acquisition cost of a Nehalem system is about 50-75% more than a Harpertown system, but the performance gain per watt makes the increased cost well worth it. If a facility is power constrained it makes sense for an organization to replace a portion of its older systems immediately. Even if the the facility isn't power constrained the maths shows that you can save money replacing the systems now. This only gets better in six months as the price of Nehalem systems falls.

This is where air side economization and other facility optimizations play out. As my cost of powering and cooling a data center falls, acquisition cost of the server becomes a larger portion of my monthly spend, not operation cost. For my Harpertown systems datacenter operation cost equals acquistion cost in just over 23 months, for a Nehalem system it's 44 months. If I lower my datacenter operation cost by 30% with the use of elevated temperatures and airside economizers then it is no longer cost effective to replace servers after 24 months with more power efficient units. That means less efficient older technologies get to hang around in the datacenter for an extra year or two and not end up in a landfill. I don't see an extra percent or two of hard drive or power supply failure as a reason not to elevate temperatures in the datacenter. Anything to the contrary seems like unsubstantiated FUD to me.
Wednesday, April 29, 2009 12:49:31 PM (Pacific Standard Time, UTC-08:00)
I agree Tinkhank. Nahelem is excellent when looking at work done watt and particularily excells at memory intensive workloads (the common case for most of us).

--jrh
James Hamilton
jrh@mvdirona.com
Wednesday, April 29, 2009 12:58:23 PM (Pacific Standard Time, UTC-08:00)
Ouch, some things people are saying adds up to one thing. Server sales and no reclaimation. 18% gets claimed and of that only a small percent get recliamed to a level where it can be reused. EPA and CONSUMER AFFAIRS. But I can see this isn't about sustainability it's about what's in your back yard and you can't see the pile from your office window. Good luck, I thought we actually we a progressive community. I really could go on for days about some of the views on this thread but it's just to sad to comment anymore. I know my way is cost effective, sustainable and can save a DC >72% on thier cooling, without sacrificng the enviroment. I'm doing my part for all of you. Let the server gods have mercy on your data! James, I think you and I are closer on this than you might think...
Rick Cockrell
Wednesday, April 29, 2009 1:16:36 PM (Pacific Standard Time, UTC-08:00)
I hear you Rick and 72% savings would clearly assure me a couple more years of future employment. I totally like the sound of it but, like all things in engineering, we need the data. Its very hard to figure out where you are finding the 72% but you have my attention.

Show me the data and, if credible, I'll happy blog it far and wide with attribution.

--jrh
James Hamilton, jrh@mvdirona.com
Saturday, May 02, 2009 9:34:35 AM (Pacific Standard Time, UTC-08:00)
James, here I thought I was done commenting on this issue but someone advised me of something and you will have to educate me on this. A Server Farms efficiency is a matter of a few things right? Utilization vs Efficiency. If my favorite search engine has very efficient servers but it only takes .12 seconds to get my answer would that mean they had a low utilization rate as compared to another that takes let say 1 minute?, 30 seconds?, 15 seconds? 2 Seconds? Would the utilization rate effect the total efficiency of the server farm rather than 1 server vs another? How long will it be before this issue is solved? Does this effect the whole theory of one server is more efficient than another? Would this slow things down, but by how much before it creates problems? I think that if I was playing an online game I'd be upset that it was slow, but if I was at work and searching my favorite site for let's say "how to improve a data centers efficiency" would it really matter. Is it true that a 20% increase in utilization has a bigger affect on efficiency than the last 12 years worth of energy efficiency gains in servers? What's you opinion on this?
Saturday, May 02, 2009 10:35:44 AM (Pacific Standard Time, UTC-08:00)
Server utilization is a HUGE issue. You asked "Is it true that a 20% increase in utilization has a bigger affect on efficiency than the last 12 years worth of energy efficiency gains in servers?" That's probably a bit aggressive but utilization is hugely important and one of the strongest reasons why cloud services make good economic sense.
See: http://perspectives.mvdirona.com/2009/04/21/McKinseySpeculatesThatCloudComputingMayBeMoreExpensiveThanInternalIT.aspx.

So utilization is a big deal. Is it bigger than all the improvements over the last 12 years? Probably not but it will likely dominate the improvements over the next 12.

All this is both interesting and important but unrelated to the efficiency of running a data center at very low temperatures and with very tight control on humidity. I'm still a bit skeptical of your position in that area. Supporting data for that position would be useful.

--jrh
James Hamilton, jrh@mvdirona.com
Saturday, May 02, 2009 11:56:00 AM (Pacific Standard Time, UTC-08:00)
James,

first you are as dedicated as anyone in this industry and I love that. I have more than 14 months worth of energy efficiency data on the Core4 System, measurement points - total enviromental system kW vs UPS server load kW trended every 5 minutes. This was for our rebates M&V, for PG&E's NRR program. We have been issues a check for $159,000 per 100 tons of cooling, to take the facility from 112kW of cooling for 158kW of server 158/122 or a ratio of 1.41 to 1 (PUE of 1.87, brand new CRAC's) to an average of 39kW of cooling for 258kW of server 258/34 or a ratio of 6.6 to 1 at operating temperatures of 57F SAT, 68-72 RAT and 45% - 47% RH. (PUE 1.29) and getting better as they build out. Is this the type of data you want? (I hate mentioning my system as it discredits my attempt to say ASE isn't the best option, WSE is better for the enviroment)

VS...

ASE - Below is the section of the report regarding energy use from Intel, maybe you can help me figure something out. What was the server load on each section of the container? They state that the total load with both compartments was 500kW before using ASE. They suggested that the cooling energy from half of the box was reduced from 111kW to 28kW on the economizing section, is this to suggest that the total cooling before they used the ASE was 222kW and the server load was 278kW (500kW total two container use) Does that also a 139kW server load per side? That would be a ratio of 139/29 4.7 to 1. State of the art yes, but ass kicking no. Am I correct?

ASE at high temps 4.7 to 1 (I really don't know how they are that bad)

vs. Core4 6.6 to 1 57F SAT / 68 RAT / 45%

I challange anyone to come out and contest Sonic's results, I win they buy me lunch, They win I buy them lunch (I should say for a year but I don't have that much money)

Intel Report:
Power Consumption
Total power consumption of the trailer was approximately 500 kilowatts (KW) when using air conditioning in both compartments. When using the economizer, the DX cooling load in the economizer compartment was reduced from 111.78 KW to 28.6 KW, representing a 74 percent reduction in energy consumption.

How much data do you want on core4 and how many people do you want to validate it. We have kW Engineering, PGE, Sonic.net and our venture investors. The data is in the form of excel spread sheets. Again I'm not wanting anything from this other than to say that ASE Sucks Ass but not for just efficiency reasons but for sustainability reasons. Also, not many existing facilites can't utilize ASE, as in general cutting very large holes in walls (stuctural issues)and ceilings to route air in, installing new ducting to get air where you want it, isn't even possible.
Sunday, May 03, 2009 7:40:15 AM (Pacific Standard Time, UTC-08:00)
Ok, so some people are wondering how can a cooling system that keeps perfect humidity & temperature have a better performance ratio than a pure fan system using outside air. I might soon be presenting on this exact question at one of your trade shows this fall.

Demystifying Cooling. After thinking about Intels numbers I came up to one conclusion to what they were doing with their economizing. Since they gave up the controlled temperature 68F I'm assuming they had to suppliment that cooler temperature with more air flow to keep their servers cool enough to run within common temperature ranges to reduce the failures. Two factors, airflow and temperature difference, remove heat. The lower the air flow, the higher the temperature difference and visaversa. If my calculations are correct they were running about 1440 CFM air (per ton of heat) to keep the servers within their operating range. What that should tell everyone is that to push that much air it takes alot of energy and again nothing is free in this world. Fan system performance and energy use are based on two main factors, the efficiency of the fan system and the ammount of static presssure losses in the system. This is why under all circumstances I can take a 24-36" raised floor and make it far more efficienct and any overhead ducted system. That's a whole other story. Any typical cooling system runs about 600 CFM of air per ton. Our systems is designed to operate at between 300-500 CFM (depending on customers layout) with very efficienct fans. This system was running about 1440 CFM with inefficient fans.

My point is that nothing is what it seems and thats why we compare both the server energy use to the cooling energy use, you can't hide inefficiencies that way. A BTU is a BTU, and a kW is a kW. And a guarnetee is a guarentee. There is no guessing how much energy a cooling system should use at any level of buildout, no guessing on water use, and no guessing on e-waste. It's that simple...

Conventional thinking is what got us into this mess we are in, my fans for the same heat load would have drawn about 5.34 kW. Math is math. That's if I had designed the cooling.
Sunday, May 03, 2009 8:00:03 AM (Pacific Standard Time, UTC-08:00)
Rick, you started this discussion arguing that very low set points with the humidity tightly controlled to +/-2% humidity was more "sustainable" than higher data center temperatures with less humidity control. I've been skeptical and have been asking for some data to support that unusual perspective.

You're recent comments have mostly focused around advertising that the product your company sells is "efficient". That may be but this blog is not really intended to be an advertising platform. And, leading the discussion with extreme statements on the sustainability of very low set points and humidity control to +/-2% may not be the best introduction to your company. However, I applaud your focus on the environment and efficiency.

I continue to recommend increasing the industry average set points and much less control on humidity.

James Hamilton
jrh@mvdirona.com
Thursday, May 07, 2009 9:34:25 PM (Pacific Standard Time, UTC-08:00)
So we are seeing 2% higher failures per year. That means that if we don't pull failures off racks for repair, over the four year life of the gear we need 8% more gear. Not 100%, 8%.

Consider the amount of copper and plastic and concrete we are putting into raised floor datacenters. Consider if we lost the chillers and the building and built an air filter and a pair of redundant blowers into the shipping containers that hold the servers. And consider if we dropped them wherever there was a spare vacant lot, a dozen cheap megawatts and fiber> With careful site placement avoiding salt spray, freezing temperatures and very high humidity this should not be a problem. Would we be ahead on a total materials used basis burning 8% of our gear?

I think yes. The military thinks yes, for different reasons they've been doing this for almost half a century. Google thinks yes. They don't say so, but actions speak louder than words. And now, more recently, Microsoft also thinks yes.
Chris Bock
Thursday, May 07, 2009 11:13:54 PM (Pacific Standard Time, UTC-08:00)
"Rick, you started this discussion arguing that very low set points with the humidity tightly controlled to +/-2% humidity was more "sustainable" than higher data center temperatures with less humidity control. I've been skeptical and have been asking for some data to support that unusual perspective. "

Laptops and Desktop computers have no humidity control and most don't have dust filters. That's over a billion data points right there. What's the fail rate of each component? Dell, HP, Lenovo, they all have this data. Next time you place an order with them, ask them for it. I can tell you from my organisation the number of failures per year is in the 2% range for disk and power supply, and 0.25% for other components. Mechanically, desktops are more hostile than server envionments because they are regularly thermal cycled.

Servers aren't special, the power supply, network, processor, disk and memory is the same gear. With careful control of the thermodynamics in the enclosure, why can't it be cooled the same?

Now, if I was doing a Microsoft or Google scale buildout, large redundant 230VAC blowers and redundant rectifiers at the rack level would be a good idea and would totally eliminate the server power supply and fans from the failure equation.
Chris Bock
Monday, May 11, 2009 4:36:03 AM (Pacific Standard Time, UTC-08:00)
Chris, what you recommend makes perfect sense to me.

James Hamilton
jrh@mvdirona.com
Comments are closed.

Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.

Archive
<April 2009>
SunMonTueWedThuFriSat
2930311234
567891011
12131415161718
19202122232425
262728293012
3456789

Categories
This Blog
Member Login
All Content © 2014, James Hamilton
Theme created by Christoph De Baene / Modified 2007.10.28 by James Hamilton