I’m not sure how many times I’ve read or been told that power is the number one cost in a modern mega-data center, but it has been a frequent refrain. And, like many stories that get told and retold, there is an element of truth to the it. Power is absolutely the fastest growing operational costs of a high-scale service. Except for server hardware costs, power and costs functionally related to power usually do dominate.
However, it turns out that power alone itself isn’t anywhere close to the most significant a cost. Let’s look at this more deeply. If you amortize power distribution and cooling systems infrastructure over 15 years and amortize server costs over 3 years, you can get a fair comparative picture of how server costs compare to infrastructure (power distribution and cooling). But how to compare the capital costs of server, and power and cooling infrastructure with that monthly bill for power?
The approach I took is to convert everything into a monthly charge. Amortize the power distribution and cooling infrastructure over 15 years and use a 5% per annum cost of money and compute the monthly payments. Then take the server costs and amortize them over a three year life and compute the monthly payment again using a 5% cost of money. Then compute the overall power consumption of the facility per month and compare the costs.
Update: fixed error in spread sheet comments.
What can we learn from this model? First, we see that power costs not only don’t dominate, but are behind the cost of servers and the aggregated infrastructure costs. Server hardware costs are actually the largest. However, if we look more deeply, we see that the infrastructure is almost completely functionally dependent on power. From Belady and Manos’ article Intense Computing or In Tents Computing, we know that 82% of the overall infrastructure cost is power distribution and cooling. The power distribution costs are functionally related to power, in that you can’t consume power if you can’t get it to the servers. Similarly, the cooling costs are clearly 100% related to the power dissipated in the data center, so cooling costs are also functionally related to power as well.
We define the fully burdened cost of power to be sum of the cost of the power consumed and the cost of both the cooling and power distribution infrastructure. This number is still somewhat less than the cost of servers in this model but, with cheaper servers or more expensive power assumptions, it actually would dominate. And it’s easy to pay more for power although, very large datacenters are often located to pay less (e.g. Microsoft Columbia or Google Dalles facilities).
Since power and infrastructure costs continue to rise while the cost of servers measured in work done per $ continues to fall, it actually is correct to say that the fully burdened cost of power does, or soon will, dominate all other data center costs.
For those of you interested in playing with different assumptions, the spreadsheet is here: OverallDataCenterCostAmortization.xlsx (14.4 KB).
–jrh
James, this is excellent. Could you update the file based on current DC capabilities and power costs?
What I’m involved with at AWS is, unfortunately, getting less and less relevant to the general purpose DC buying community. We now design our own ASICS, servers, routers, data center power distribution, and mechanical systems. The use of all custom equipment gives us the ability to offer unique features to our customers and it substantially reduces costs and increases availability. But, it means I know longer have a good perspective on commercial off the shelf costs.
Wow very interesting pie chart;
these big companies know that having a chatbot train on billions of inputs and mostly produce more “noise” on the internet, is not going to make them money.
Customer targeted applications of machine learning is where the action is. It will take years before any tangible results will emerge. Bio innovations and improvement of patient care will take good old programmers to implement the AI for to debug and enable the machine learning.
Apple already uses a neural engine to do face recognition for authentication on their phones. Incremental improvements of their product are the norm.
All the hype is just silicon valley squeezing the stock market again just like in the late nineties and the crash in 2000. HAL is not around the corner. I have heard that some think just building a bigger computer will enable “consciousness”. Just keep making the neural layers bigger and faster and you will get human consciousness and understanding ,at sight speeds. How big will that sentient data center need to be?
I agree sentience is not here nor even near but there will be big innovations and a great deal of value will be created.
the main reason for the “power dominates cost”, is the 2kw power consumption by most enterprise servers. of course then the power consumption dominates.
I agree with you that power is rising quickly in modern servers but hardware costs still dominates and remains the most important factor for our deployments.
Thanks for the pointer to Christian’s article Andy. It’s excellent. Christian is one of the best.
In the article Belady says power + cooling costs more than the servers and we’re pretty close to agreement on that. If you adjust the PUE assumption to something in the high 2’s which unfortunately is not unheard of, infrastructure does cost more than servers. And, given that that work done/$ is improving constantly but infrastructure costs are flat to rising, we expect to see infrastructure eventually dominate everywhere. This was an observation made by Ken Church a year or so ago and let to Ken and I writing: Diseconomies of scale (//perspectives.mvdirona.com/2008/04/06/DiseconomiesOfScale.aspx).
–jrh
An excellent analysis…thanks.
I think the origin of the "Power is the biggest cost" statement is a paper by Microsoft’s own Christian Belady, although he may have working for HP at the time (?). This paper has been widely referenced by analysts, including myself, and by the EPA in its report on datacenters. It would be interesting to compare the assumptions behind the models.
Belady, Christian L. 2007. In the Data Center, Power and Cooling Costs More Than the IT
Equipment it Supports. Electronics Cooling. vol. 13, no. 1. http://electronics11
cooling.com/articles/2007/feb/a3/.comScore Networks. 2007.com
Andy Lawrence, 451 Group
I agree across the board Jacque. Thanks,
–jrh
James,
Excellent information. My inputs:
1. ASHRAE (American Society of Heating, Refrigeration and Air-Conditioning Engineers) standards support your data: Refresh standard for building cooling systems is 10-25 years. Recent updates by ASHRAE TC 9.9 include IT equipment refresh rate at 2-5 years. Given the rate of change, in particular efficiency gains, it is prudent to use the low value for the IT equipment. The larger building systems are less likely to see improvements for a variety of reasons, so other factors should guide where along the 10-25 year time line is best.
2. The percentages are interesting, but of limited value other than for strategic guidance on where to focus cost control/asset management/efficiency efforts. My company provides solutions for improving data center efficiency, so our focus is on measuring the savings generated by our programs. We do supply some data from The Green Grid that speaks to how power consumption is distributed across the data center to help client understand where they can realize savings. In support of your data, virtualization and consolidation top the list; i.e., reducing the number of those expensive servers.
Again, great information. Thank you.
Roger asked for breakdown by server, storage, and networking gear. Internet-scale service rarely use SANs and typically employ direct attached disk rather than the SANs common in enterprise deployments. The Google GFS paper gives one example of a low-cost, direct attached storage sub-system that allows them to avoid the use of SANs. The direct attached disks are costed in with the servers in the model.
I will factor out the network gear in the future and show it separately as you suggest but the quick answer is networking gear costs and power is swamped by the servers. It’s a relevant cost but small compared to the others we’re looking at. Here’s the problem. If you have 50k serves with 40 servers per rack, you end up with only 1,250 top of rack switches at around $3k to $5k each. Adding in ~250 layer 2 switches, and a similar number of layer three routers, a few aggregation routers, and a couple of border routers looks like a lot of money (it is!) but it is small when compared to the cost of the servers. Generally under 3% but I will get around to adding network gear to the model. Thanks for the suggestion.
Ramki, you brought up three points:
1. Timing of investment. You start with 1 server and grow to 50,000 over time.
That doesn’t change the argument in that regardless of how slowly you add servers, the power consumed scales with the servers and the cost of power will always be behind the cost of servers and infrastructure. Because data centers come with such a huge upfront investment, the best thing you can do fiscally is fill them as fast as possible and utilize all the resources you have had to pay for upfront. That’s hard to do and it’s why I’ve argued for modular data centers for several years now: See http://mvdirona.com/jrh/talksAndPapers/JamesRH_CIDR.doc or a more Hotnets paper: http://conferences.sigcomm.org/hotnets/2008/papers/10.pdf).
2. No datacenter architecture can be relevant for more than 3 to 5 years.
If that has been your experience Ramki, it is very unusual. Cooling towers, heat exchanges, primary cooling pumps, high voltage transformers, medium voltage transformers, high output generators, last a LOT longer than 3 to 5 years. 15 year amortization is common and quite reasonable. If you are upgrading infrastructure more frequently than every 10 years, those decisions need more review.
3. Model doesn’t include networking and storage.
The model does include storage. It’s direct attached and part of the server purchases. SANs are not common in internet-scale services and the data in the original article is why: in the enterprise, where many report that people costs dominate, SANs are a common choice or storage. In the services world, where hardware costs dominate, they are much less common due to the SAN tax being fairly noticeable at scale. We use DAS in the model and I’ve not factored the storage out separately.
It’s a good suggestion to add networking gear to the model but as I’ve argued above, networking costs although high are relatively small when compared to the costs under consideration. But I agree I should add networking to the model. Thanks,
–jrh
James Hamilton, Data Center Futures
Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com
H:mvdirona.com | W:research.microsoft.com/~jamesrh | blog://perspectives.mvdirona.com
Nice starting point. A few other things to note:
1. The per-month model forgets to account for the TIMING of the capital and expense investments.
The upfront cost of this example datacenter is 200 Million $. That’s for 1 server to 50,000 serves.
think about why the "power" seems to be such a large investment in this scenario.
2. Risk of obsolescence is not factored. No datacenter architecture can be relevant for more
than 3-5 years nowadays. Not unless you factor in a 5+ year "infrastructure recycle" cost.
3. these servers need some additional infrastructure. (eg. switches, storage, to name two). Those can
consume almost as much power as servers, combined. Is this included in the 200M datacenter cost?
I’m curious to know what the cost breakdown is between servers, storage and networking in a data center. Lumping everything into servers and other infrastructure isn’t as useful to me as at least breaking it down into the 3 major categories of equipment.
Yes, you’re right the comment explaining the formula on amortization period in Cost of Power in Large-Scale Data Centers is incorrect. Thanks for both you and Mark catching this. You may be right in your guess as to the origin of the “power is largest” legend.
You brought up another important point that is worth digging deeper into. You point out that we need to buy enough servers to handle maximum load and argue that you should shut off those you are not using. This is another one of those points that I’ve heard frequently and am not fundamentally against but, as always, it’s more complex than it appears.
I’ll post a more detailed look at this in a blog entry later today or tomorrow. Thanks for the interesgting comment.
–jrh
jrh@mvdirona.com
It looks like you’ve swapped the "years" values from the Facilities Amortization and Server Amortization lines. The Facilities Amortization line should say 15 years, and Server 3. The month values are correct, just the years are swapped.
I wonder if the origin of "power is the biggest cost" is someone dropping a word from "power is the biggest *manageable* cost"? If there is an estimated peak load, the server cost is fixed at the rate necessary to meet the load. But average load should be less than peak, meaning some of those servers could be turned off or running in a lower power consumption mode much (or most) of the time.
Good hearing from you Joshua. You asked about networking ingress/egress charges? They are super application dependent. On YouTube, they lead. But, in the average case and in every service I’ve been involved with, networking is behind 1) server hardware, 2) cooling, 3) power provisioning, and 4) power. I’ve dug through detailed costs from several who were willing to share their cost data with me and 15% for networking egress/ingress just keeps coming and seems to be a very good estimate.
Mark, it’s good hearing from you as well. I changed the 3 year and 15 year chart labels in the copy I’ve been working from. You know the deal: never trust the comments in the code!
Your right that you can use 5 year amortization rather than3 in hardware and that does effectively lower the annual cost of the hardware. I’m not against doing this and, in fact, I’ve worked with many servers that do this. Recently I’ve been arguing against keeping servers longer than 3 years in some cases because they were huge, very power inefficient Xeon’s and replacing them with Nehalem gave so much more work done/watt that it justified the cost. I was just working an example where I took 350w servers, the annual cost of power, and slightly faster 160W servers and worked the numbers. It actually is more cost effective to replace them than to run the old ones for longer. It’s a fun argument – I’ll post it when I get some time next week.
The key two points to get out of the chart are: 1) power is NOT the biggest cost, 2) fully burdened power (power + infrastructure) is either biggest cost or is going to be soon (depends how you price out servers).
Jake you asked for personal costs. The surprising then is that they are remarkably small. This is another one of those super-interesting observations. Its another one of those often repeated quotes I’ve come across like “power dominates”. Like “power dominates” the people costs dominates argument is accurate in some domains but generally not true of high-scale services. People costs dominate is often true of enterprise data centers. Server counts to admins typically run in the 100:1 to 140:1 to range. See http://www-db.cs.wisc.edu/cidr/cidr2007/slides/p41-lohman.ppt for an example of the people costs dominate argument.
People costs in the enterprise are huge. But when you are running 10^3 to 10^4 servers, you need to automate. Some of the techniques to automate well are in: http://mvdirona.com/jrh/talksAndPapers/JamesRH_Lisa.pdf. After automation, admin costs are under 10% and often well under 5% of the total cost of operation. The admin costs disappear into the noise.
See slide 5 in this deck for a quick service != enterprise argument: http://mvdirona.com/jrh/TalksAndPapers/JamesRH_Ladis2008.pdf.
I’ve seen high-scale data centers where security and hardware are under 1 person/MW. Almost free. I’ve led service that were not super-high scale and we were not close to as automated as we should be and, even then, the admin costs were only 9%.
If you want to include security, hardware admin, and software operations, a conservative estimator for a well automated, high-scale service is 10%
Thanks for your thoughts, observations, and corrections.
–jrh
James Hamilton, Data Center Futures
Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com
H:mvdirona.com | W:research.microsoft.com/~jamesrh | blog://perspectives.mvdirona.com
Any reason why personnel costs are not included? Personnel costs can be a significant part of the ongoing costs of a datacenter as well.
I have repeatedly pushed back when people repeat the "power is the majority cost" mindlessly… but words aren’t as compelling as a pretty graph. Thanks for putting this spreadsheet together James. A few of observations about the spreadsheet. First, you reversed labels(3 years) next to 180 cell, and (15 years) next to the 36 cell. Second, while financial models typically have a 3 year amortization of computing hardware… I have typically seen computing boxes stay in production service for 5 years which reduces the server costs to less that 1/2 the total cost. There are two costs not captured which might be appropriate… both mentioned by others already. One is the bandwidth required to deliver the service, and the other is the cost of software. I would note that for James, running a microsoft OS might not seem significant since his employer produces XP/Vista.
–Mark
It looks like your pricing model for servers assumes the servers are running Linux and not Windows, otherwise the pricing would be much higher to cover the cost of your organization’s software and associated licenses.
Very cool analysis! What would be some reasonable ranges for network egress?