In a comment to the last blog entry, Cost of Power in Large-Scale Data Centers Doug Hellmann brought up a super interesting point
It looks like you've swapped the "years" values from the Facilities Amortization and Server Amortization lines. The Facilities Amortization line should say 15 years, and Server 3. The month values are correct, just the years are swapped.I wonder if the origin of "power is the biggest cost" is someone dropping a word from "power is the biggest *manageable* cost"? If there is an estimated peak load, the server cost is fixed at the rate necessary to meet the load. But average load should be less than peak, meaning some of those servers could be turned off or running in a lower power consumption mode much (or most) of the time.
This comment has not been screened by an external service.
Doug Hellmann
Yes, you’re right the comment explaining the formula on amortization period in Cost of Power in Large-Scale Data Centers is incorrect. Thanks for you, Mark Verber, and Ken Church for catching this.
You brought up another important point that is worth digging deeper into. You point out that we need to buy enough servers to handle maximum load and argue that you should shut off those you are not using. This is another one of those points that I’ve heard frequently and am not fundamentally against but, as always, it’s more complex than it appears. There are two issues here: 1) you can actually move some workload from peak to the valley through a technique that I call Resource Consumption Shaping and 2) turning off isn’t necessarily the right mechanism to run more efficiently. Let’s look at each:
Resource Consumption Shaping is a technique that Dave Treadwell and I came up with last year. I’ve not blogged it in detail (I will in the near future), but the key concept is prioritizing workload into at least two groups: 1) customer waiting and 2) customer not waiting. For more detail, see page 22 of the talk Internet-Scale Service Efficiency from Large Scale Distributed Systems & Middleware (LADIS 2008). The “customer not waiting” class includes reports, log processing, re-indexing, and other admin tasks. Resource consumption shaping argues you should move “customer not waiting” workload from the peak load to off-peak times where you can process it effectively for free since you already have the servers and power. Resource consumption shaping builds upon Degraded Operations Mode.
The second issue is somewhat counter-intuitive. The industry is pretty much uniform in arguing that you should shut off servers during non-peak periods. I think Luiz Barroso was probably the first to argue NOT to shut off servers and we can use the data from Cost of Power in Large-Scale Data Centers to show that Luiz is correct. The short form of the argument goes like this: you have paid for the servers, the cooling, and the power distribution for the servers. Shutting them off only saves the power they would have consumed. So, it’s a mistake to shut them off unless you don’t have any workload to run with a marginal value above the cost of the power the server consumes since you have already paid for everything else. If you can’t come up with any workload to run worth more than the marginal cost of power, then I agree you should shut them off.
Albert Greenberg, Parveen Patel, Dave Maltz and I make a longer form of this argument against shutting servers off in an article to appear in the next issue of SIGCOM Computer Communications Review. We also looked more closely at networking issues in this paper.
--jrh
James Hamilton, Data Center FuturesBldg 99/2428, One Microsoft Way, Redmond, Washington, 98052 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com
H:mvdirona.com | W:research.microsoft.com/~jamesrh | blog:http://perspectives.mvdirona.com
Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.