In the Cost of Power in Large-Scale Data Centers, we looked at where the money goes in a large scale data center. Here I’m taking similar assumptions and computing the Annual Cost of Power including all the infrastructure as well as the utility charge. I define the fully burdened cost of power to be the sum of 1) the cost of the power from the utility, 2) the cost of the infrastructure that delivers that power, and 3) the cost of the infrastructure that gets the heat from dissipating the power back out of the building.
We take the monthly cost of the power and cooling infrastructure assuming a 15 year amortization cycle and 5% annual cost of money billed annually divided by the overall data center critical load to get the annual infrastructure cost per watt. The fully burdened cost of power is the cost of consuming 1W for an entire year and includes the infrastructure power and cooling and the power consumed. Essentially it’s the cost of all the infrastructure except the cost of the data center shell (the building). From Intense Computing or In Tents Computing, we know that 82% of the cost of the entire data center is power delivery and cooling. So taking the entire monthly facility cost divided by the facility critical load * 82% is an good estimator of the infrastructure cost of power.
The fully burdened cost of power is useful for a variety of reasons but here’s two: 1) current generation servers get more work done per joule than older serves — when is it cost effective to replace them? And 2) SSDs consume much less power than HDDs –how much can I save in power over three years by moving to using SSDs and is it worth doing?
We’ll come back to those two examples after we work through what power costs annually. In this model, like the last one (http://perspectives.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx), we’ll assume a 15MW data center that was built at a cost of $200M and runs at a PUE of 1.7. This is better than most, but not particularly innovative.
Should I Replace Old Servers?
Let’s say we have 500 servers, each of which can process 200 application operations/second. These servers are about 4 years old and consume 350W each. A new server has been benchmarked to process 250 operations/second, and each of these servers costs $1,3000 and consumes 165W at full load. Should we replace the farm?
Using the new server, we only need 400 servers to do the work of the previous 500 (500*200/250). The new server farm consumes less power. The savings are $111kw ((500*350)-(400*160)). Let’s assume a plan to keep the new servers for three years. We save 111kw each year for three years and we know from the above model that we are paying $2.12/kw/year. Over three years, we’ll save $705,960. The new servers will cost $520,000 so, by recycling the old servers and buying new ones we can save $185,960. To be fair, we should accept a charge to recycle the old ones and we need to model the cost of money to spend $520k in capital. We ignore the recycling costs and use a 5% cost of money to model the impact of the capital cost of the servers. Using a 5% cost of money over three years amortization period, we’ll have another $52,845 in interest if we were to borrow to buy these servers or just in recognition that tying up capital has a cost.
Accepting this $52k charge for tying up capital, it’s still a gain of $135k to recycle the old servers and buy new ones. In this case, we should replace the servers.
What is an SSD Worth?
Let’s look at the second example of the two I brought up above. Let’s say I can replace 10 disk drives with a single SSD. If the workload is not capacity bound and is I/O intensive, this can be the case (see When SSDs Make Sense in Server Applications). Each HDD consumes roughly 10W whereas the SSD only consumes 2.5W. Replacing these 10 HDD with a single SSD could save 97.5W/year and, over a three year life. That’s a savings of 292.5W. Using the fully burdened cost of power from the above model, we could save $620 (292.5W*$2.12) on power alone. Let’s say the disk drives are $160 each and will last three years, what’s the break-even point where the SSD is a win assuming the performance is adequate and we ignoring other factors such as lifetime and service? We’ll take the cost of the 10 disks and add in the cost of power saved to see what we could afford to pay for an SSD – the breakeven point (10*160+620 => $2220). If the SSD is under $2220, then it is a win. The Intel X-25E has a street price of around $700 the last time I checked and, in many application workloads, it will easily replace 10 disks. Our conclusion is that, in this case with these assumptions, the SSD looks like the better investment than 10 disks.
When you factor in the fully burdened price of power, savings can add up quickly. Compute your fully burdened cost of power (the spread sheet<JRH>) and figure out when you should be recycling old servers or considering lower power components.
If you are interested in tuning the assumptions to more closely match your current costs, here it is: PowerCost.xlsx (11.27 KB).
–jrh
James Hamilton, Data Center Futures
Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com
H:mvdirona.com | W:research.microsoft.com/~jamesrh | blog:http://perspectives.mvdirona.com
Hi Kevin.
You wanted to include a few more factors in the model.
A few other thoughts to consider in this model:
* One-time costs to evaluate new servers
* Per-server costs for software (may or may not be licenses tied to hardware, # of cores, etc.)
* Per-server costs to install new servers, including software load
* Costs to remove old servers (wipe hard disks, remove from racks, could lump in disposal fee here)
I think there’s a very compelling reason to make these changes, but these factors will likely be part of the decision.
I don’t include new server evaluation in this model. In even medium sized deployments this cost is lost in the noise. For small deployments, I agree it can be relevant. Software costs are interesting. Software R&D for internally written software is explicitly not included nor is customer support and other relevant costs for running a service. Software licenses are also not included. Many services chose to operate with open source or internally developed software and some do use licensed software and, for those that do, the cost is relevant. Since there are so many application specific variations in software, the model I provided is just to provide, power, and cooling a large fleet of commodity servers. Everything from the software stack up is not included but would be easy to add for whatever service you have in mind.
I didn’t include software install or server install nor server recycling. Software install should be 100% automated – new servers should be registered with the automation software and the entire stack just gets installed and the server brought on-line. If you are running a big service and using people, fix it. Servers should be purchased in full rack units.
Server vendors are all willing to help with secure, environmentally safe disposal. Just mention to your favorite provider that you would like to replace 3,000 servers but don’t have budget for recycling the old ones. I’ll bet a solution is found fairly quickly :-).
I like your argument on the SSDs and the additional power loss through server power supplies.
–jrh
jrh@mvdirona.com
James,
A few other thoughts to consider in this model:
* One-time costs to evaluate new servers
* Per-server costs for software (may or may not be licenses tied to hardware, # of cores, etc.)
* Per-server costs to install new servers, including software load
* Costs to remove old servers (wipe hard disks, remove from racks, could lump in disposal fee here)
I think there’s a very compelling reason to make these changes, but these factors will likely be part of the decision.
The numbers supporting the SSD argument may even be a bit better than what you describe above. The PUE factor treats the PSU losses as part of the IT load, so you may need to divide the component power consumption by the PSU efficiency. For example, if the power supplies are 90% efficient, the 10 HDDs @ 10W each would consume 100W / 90% efficiency = 111W. The SSD’s 2.5W / 90% efficiency becomes 2.8W. The net savings is 111W – 2.8W = 108.2W, which yields an even more compelling case than the 97.5W savings described earlier.
It’s good to see calculators like this with numbers people can relate to,
–kb
OK, you save 50W for whatever time period you chose. I agree.
–jrh
I see your point, but the engineering pedant in me still wishes the units were more correct:
50W * $2.12/W/year * 3 years = $318 — which is all correctly dimensioned
50W/year * $2.12/W/year * 3 years = $318/year — which would be wrong
By the way, the UNIX units program happily confirms this (and checks units).
Say hi to Patrick for me as well.
Yes, I think the time frame is relevant. The savings are for the life of the hardware or for the length of time you have the hardware in use. We save $2.12 for each watt reallocated for each full year. For example, if you move to new servers that save 50W over the old servers, you plan to have them in use for three years, and we know that fully burdened power is $2.12/w/year. From all that we know the savings are 50w/year * 3 years * $2.12/w/year which is $318. It’s saving over some time period that is interesting to us in this case.
–jrh
Patrick says hi too.
"Essentially saving 97.5W/year says that, by supporting the application load on different equipment, we were able to support the workload and free up the ability to power and cool 97.5W of equipment for that period of time."
But are you sure that denominator ("/year") is meaningful here? I read the original article as saving 97.5W, period.
Always good to hear from you Frank. It’s been more than 15 years.
When modeling fully burdened power, when we say we save 97.5W/year, what is mean more precisely is that the power distribution equipment able to deliver 97.5 watts is open for use elsewhere for that period. Also saved during that period is mechanical equipment required to remove the heat that would have been dissipated by 97.5W of IT load and also cool the power dissipated in power distribution and the power dissipated by the cooling system overhead. Finally the actually power itself is not used and available for use elsewhere in the data center.
Essentially saving 97.5W/year says that, by supporting the application load on different equipment, we were able to support the workload and free up the ability to power and cool 97.5W of equipment for that period of time.
–jrh
jamesrh@microsoft.com
Jamie, it may help avoid the appearance of dimensionality errors if you use energy or other fully spelled out units instead of writing something like "saving 97.5 W/year".