Monday, April 20, 2009

I’m always interested in research on cloud service efficiency, and last week, at the Uptime Institute IT Symposium in New York City, management consultancy McKinsey published a report entitled Clearing the air on Cloud Computing. McKinsey is a well respected professional services company that describes itself as “a management consulting firm advising leading companies on organization, technology, and operations”.  Over the first 22 years of my career in server-side computing at Microsoft and IBM, I’ve met McKinsey consultants frequently, although they were typically working on management issues and organizational design rather than technology. This particular report focuses more on technology, where the authors investigate the economics of very high scale data centers and cloud computing. This has been my prime area of interest for the last 5 years, and my first observation is the authors are taking on an incredibly tough challenge.

 

Gaining a complete inventory of the costs of internal IT is very difficult. The costs hide everywhere.  Some are in central IT teams, some are in central procurement groups, some with the legal and contract teams, and some in departmental teams doing IT work although not part of corporate IT. It’s incredibly difficult to get a full, accurate, and unassailable inventory into the costs of internal IT. Further complicating the equation, internal IT is often also responsible for mission-critical tasks that have nothing to do with comparing internal IT with cloud services offerings. Internal IT is often responsible for internal telco and for writing many of the applications that actually run the business.  Basically, it’s very hard to first find all the comparable internal IT costs and, even with a complete inventory of IT costs, it’s then even harder to separate out mission-critical tasks that internal IT teams own that have nothing to do with whether the applications are cloud or internally hosted. I’m arguing that this report’s intent, of comparing costs in a generally applicable way, across all industries, is probably not possible to do accurately and may not be a good idea.

 

In the report, the authors conclude that current cloud computing offerings “are not cost-effective compared to large enterprise data centers.”  They argue that cloud offerings are most attractive for small and medium sized enterprises. The former is a pretty strong statement, and contradicts most of what I’ve learned about high scale service, so it’s definitely worth digging deeper.

 

It’s not clear that a credible detailed accounting of all comparable IT costs that generalizes across all industries can be produced. Each company is different and these costs are both incredibly hard to find and entangled with many other mission-critical tasks the internal IT team owns that has nothing to do with whether they are internally hosted or utilizing the cloud. From all the work I’ve done around high scale services, it’s inarguably true that some internal IT tasks are very leveraged. These tasks form the core competency of the business and are usually at least developed internally if not hosted internally.  In what follows, I’ll argue that non-differentiated services -- services that need to be good but aren’t the company’s competitive advantage -- are much more economically hosted in very high-scale cloud computing environments. The hosting decision should be driven by company strategy and a decision to concentrate investment capital where it has the most impact. The savings available using a shared cloud for non-differentiated services are dramatic, and are available for all companies, from the smallest startup to the largest enterprise. I’ll look at some of these advantages below.

 

In this report the authors conclude that cloud computing makes sense for small and medium enterprises but “are not cost-effective to large enterprise data centers.” The authors argue there are economies of scale that makes sense for the small and medium sized businesses, but the cost advantages break down at the very large. Essentially they are arguing that big companies already have all the economies of scale available to internet-scale services. On the face, this appears unlikely. And, upon further digging, we’ll see it’s simply incorrect across many dimensions.

 

Let’s think about economies of scale.  Large power plants produce lower cost power than small regional plants.  Very large retail store chains spend huge amounts on optimizing all aspects of their businesses from supply chain optimization through customer understanding and, as a consequence, can offer lower prices. There are exceptions to be sure but, generally, we see a pretty sharp trend towards economies of scale across a wide range of businesses.  There will always be big, dumb, poorly run players and there will always be nimble but small innovators.  The one constant is those that understand how to grow large and get the economies of scale and yet still stay nimble, often deliver very high quality products at much lower cost to the customer.

 

Perhaps the economies of scale don’t apply to the services world?  Looking at services such as payroll and internal security, we see that almost no companies choose to do their own internally.  These services clearly need to be done well, but they are not differentiated.  It’s hard to be so good at payroll that it yields a competitive advantage, unless your company is actually specializing in payroll. Internal operations such as payroll and security are often sublet to very large services companies that focus on them. ADP, for example, has been successful at providing a very high scale service that makes sense for even the biggest companies. I actually think it’s a good thing that the companies I’ve worked for over the last twenty years didn’t do their own payroll and instead focus their investment capital on technology opportunities that grow the business and help customers. It’s the right answer.

 

We find another example in enterprise software.  When I started my career, nearly all large companies developed their own internal IT applications. At the time, most industry experts speculated that none of the big companies would ever move to packaged ERP systems. But, the economies of scale of the large ERP development shops are substantial and, today, very few companies develop their own ERP or CRM systems.  The big companies like SAP can afford to invest in the software base at rates even the largest enterprise couldn’t afford. Fifteen years ago SAP had 4,200 engineers working on their ERP system. Even the largest enterprise could never economically justify spending a fraction of that.  Large central investments at scale typically make better economic sense unless the system in question is one of a company’s core strategic assets.

 

I’ve argued that smart, big players willing to invest deeply in innovating at scale can produce huge cost advantages and we’ve gone through examples from power generation, through retail sales, payroll, security, and even internal IT software. The authors of the McKinsey study are essentially arguing that, although all major companies have chosen to enjoy the large economies of scale offered by packaged software products over internal development, this same trend won’t extend to cloud hosted solutions. Let’s look closely at the economics to see if this conclusion is credible.

 

In the enterprise, most studies report that the cost of people dominates the cost of servers and data center infrastructure.  In the cloud services world, we see a very different trend.  Here we find that the costs of servers dominate, followed by mechanical systems, and then power distribution (see the Cost of Power in Large Data Centers). As an example, looking at all aspects of operational costs in a mid-sized service led years ago, the human administrative costs were under 10% of the overall operational costs.  I’ve seen very large, extremely well run services where the people costs have been driven below 4%. Given that people costs dominate many enterprise deployments, how do high-scale cloud services get these cots so low? There are many factors contributing but the most important two are 1) cloud services run at very high scale and can afford to invest more in automation amortizing that investment across a much larger server population, and 2) services teams can specialize focused on doing one thing and doing it very well. This kind of specialization yields efficiency gains, but it is only affordable at multi-tenant scale. The core argument here is that the number 1 cost in the enterprise is people whereas, in high scale services, these costs have been amortized down to sub-10%. Arguing there are no economies at cloud scale is the complete opposite of my experience and observations.

 

<JRH>Page 25 of study shows a “disguised client example“ where the example company had 1,704 people working in IT before the move to cloud services and still required 1,448 after the move. I’m very skeptical that any company with 1,704 people working in IT – clearly a large company – would move to cloud computing in one, single discrete step.  It’s close to impossible and would be foolhardy.  Consequently, I suspect the data either represents a partial move to the cloud or is only a paper exercise. If the former, the data is incomplete and, if the later, the data is speculative.  The story is clouded further by including in the headcount inventory desktop support, real estate, telecommunications and many other responsibilities that wouldn’t be impacted by the move to cloud services. Adding extraneous costs in large numbers dilutes the savings realized by this disguised customer. Overall, this slide doesn’t appear informative.

 

We’ve shown that at very high scale the dominant costs are server hardware and data center infrastructure. Very high scale services hire server designers and have an entire team focused on the acquisition of some of the most efficient server designs in the world.  Google goes so far as to design custom servers (see Jeff Dean on Google Infrastructure) something very hard to economically do at less than internet-scale.  I’ve personally done joint design work with Rackable Systems in producing servers optimized for cloud services workloads (Microslice Servers). When servers are the dominant cost and you are running at 10^5 to 10^6 servers scale, considerable effort can and should be spent on obtaining the most cost effective servers possible for the workload. This is hard to do economically at lower scale.

 

We’ve shown that people costs are largely automated out of very high scale services and that the server hardware is either custom, jointly developed, or specifically targeted to the workload.  What about data center infrastructure?  The Uptime Institute reports that the average data center Power Usage Effectiveness is 2.0 (smaller is better). What this number means is that for every 1W of power that goes to a server in an enterprise data center, a matching watt is lost to power distribution and cooling overhead. Microsoft reports that its newer designs are achieving a PUE of 1.22 (Out of the box paradox…). All high scale services are well under 1.7 and most, including Amazon, are under 1.5. High scale services can invest much more in infrastructure innovations by spreading this large investment out over a large number of data centers. As a consequence, these internet-scale services are a factor of 2 more efficient than the average enterprise. This is good for the environment and, with power being such a substantial part of the cost of high-scale computing, it substantially reduces costs as well.

 

Utilization is the factor that many in the industry hate talking about because the industry-wide story is so poor.  The McKinsey report says that enterprise server utilization is actually down around 10% which is approximately consistent with I’ve seen working with enterprise customers over the years. The implication is the servers and the facilities that house them are only 10% used.  This sounds like the beginning of an incredibly strong argument for cloud services but the authors take a different path and argue it would be easy to increase enterprise utilization far higher than 10%. With an aggressive application of virtualization and related technologies, they feel utilizations as high as 35% are possible.  That conclusion is possibly correct, but it’s worth spending a minute on this point. At 35% efficiency, a full 2/3 is still wasted which seems unfortunate, unnecessary, and hard on the environment.  Improving from 10% to 35% will require time, new software, new training, etc. but it may be possible.  What’s missing in this observation is that 1) cloud services can invest more in these efficiency innovations and they are already substantially down that path, 2) large user populations allow a greater investment in infrastructure efficiency at a higher rate, and 3) not all workloads have correlated peaks, so larger, heterogeneous populations offer substantially larger optimization possibilities than most enterprises can achieve alone (see: resource consumption shaping).

 

In the discussion above, we focused on the costs “below” the software (data center infrastructure and servers) and found a substantial and sustainable competitive advantage in high scale deployments.  Looking at people costs, we see the same advantage again.  On the software-side, the cost picture ranges from less in the cloud to equal but it isn’t higher. There doesn’t seem to be a dimension that supports the claim of this report. I just can’t find the data to support the claim that enterprises shouldn’t consider cloud service deployments. Looking at slides on the McKinsey presentation that make the cost argument in detail, the graphs on slides 22, 23, and 24 just don’t make sense to me. I’ve spent considerable time on the data but just can’t get it to line up with the AWS price sheet or any other measure of reality.  The limitation might be mine but it seems others are having trouble matching this data to reality as well.

 

My conclusion: any company not fully understanding cloud computing economics and not having cloud computing as a tool to deploy where it makes sense is giving up a very valuable competitive edge. No matter how large the IT group, if I led the team, I would be experimenting with cloud computing and deploying where it make sense.  I would want my team to know it well and to be deploying to the cloud when the work done is not differentiated or when the capital was better leveraged elsewhere

 

IT is complex and a single glib answer is almost always wrong.  My recommendation is to start testing and learning about cloud services, to take a closer look at your current IT costs, and to compare the advantages of using a cloud service offering with both internal hosting and mixed hosting models.

 

                                                                --jrh

 

James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
james@amazon.com  

H:mvdirona.com | W:mvdirona.com/jrh/work  | blog:http://perspectives.mvdirona.com

 

Monday, April 20, 2009 4:53:21 PM (Pacific Standard Time, UTC-08:00)  #    Comments [15] - Trackback
Services
Tracked by:
"James Hamilton takes on that McKinsey study" (Speaking of Clouds) [Trackback]
"The Uptime Institute’s IT Symposium 2009 non attendance report" (Julien A... [Trackback]
Tuesday, April 21, 2009 12:15:55 PM (Pacific Standard Time, UTC-08:00)
As a startup CEO, I'm excited that the large-company CTOs have a study that affirms their investment in multi-million dollar data centers. This will make it much easier for companies like mine to displace those companies using cloud services.
Tuesday, April 21, 2009 12:55:31 PM (Pacific Standard Time, UTC-08:00)
I certainly agree with the statement, " ... this report’s intent, of comparing costs in a generally applicable way, across all industries, is probably not possible to do accurately and may not be a good idea." Even within the same industry, when with clients I've used commercially available research statistics comparing IT costs, I know that there are variations in company structure, service mix, and infrastructure that make meaningful cost comparisons highly problematic. Part of the problem is that just collecting cost data in a consistent way across multiple companies is fraught with variation and error.

Another comment on this is that the discussions of cloud computing, especially the recommendation that cloud computing be tried initially with non-core services, sounds very much like traditional arguments for how to decide what to outsource. While it is true that some companies have moved 100% of their IT operations to outsourcers, the reality is that many only outsource a portion of their IT services and, for all practical purposes, this results in multiple platforms that need to be supported.

Following this logic, if I push only a portion of my services to the cloud, I now need to support both my own infrastructure and how my processes interface with the cloud's services, which is potentially a complicating factor it would seem to me. This gets us back to defining how we calculate the costs and what "support" means.
Wednesday, April 22, 2009 4:58:26 AM (Pacific Standard Time, UTC-08:00)
Great comment from Jeffrey McManus: “As a startup CEO, I'm excited that the large-company CTOs have a study that affirms their investment in multi-million dollar data centers. This will make it much easier for companies like mine to displace those companies using cloud services.” I love it.

I generally agree with Dennis’ point that moving very large enterprises to the cloud in a single step isn’t likely a good idea nor a likely outcome.

From my perspective, It’s an exciting time to work on cloud services. The pace of innovation is incredible and scale opens up many optimizations that just aren’t efficiently achievable with large number of heterogeneous workloads. Suggesting that large enterprises can’t profit from using cloud services for some of their workloads just seems illogical to me.

--jrh
James Hamilton, jrh@mvdirona.com
Wednesday, April 22, 2009 5:49:22 AM (Pacific Standard Time, UTC-08:00)
Thanks for the analysis James. This is helpful. The McKinsey report struck me as substantially under researched. On that note, it would be useful to hear of more real world case studies from Amazon on how large companies are currently trying out AWS. Many of our readers say the lack of information on how corporate IT can realistically use AWS is a barrier to adoption. I'd love to chat more with you about this if you have time.
Cheers,
Jo
Wednesday, April 22, 2009 1:01:39 PM (Pacific Standard Time, UTC-08:00)
James great analysis, as always very detailed on your review. I have a few comments:

1) When I reviewed the report, I looked at the path taken by the authors on raising utilization of their existing systems as a STEP #1 of a decision making process: if you have already spent capital $$$ for an infrastructure let's get our act together and use it as much as possible, before we sink additional $$ in another Endeavour. If our utilization is low, what are the reasons behind this, review current conditions, understand our limitations, really understand our needs, how will you procure or plan the right cloud resource or even scale?

2) On the power plant analogy, you state they produce cheaper power. This is correct!, as long as we understand that their low cost is driven by: a) they are connected to a grid and they can sell power across that grid 24 hrs/day; b) they can sell power not only to their customer base but also to other merchants across the nation who needs block of power and to the US grid for power system stability purposes. When the power grid ISO's were not interconnected the low power cost statement is not 100% accurate. Also in the US regional power plants operating on gas will be cheaper than a coal fire plants for power blocks below 100MW.

3) In regards to large retailers, as Wal Mart, their approach has really improve over time to allow them to offer the lowest cost and has matured enough in the last 20 years , after been in business for much longer.

Like you stated it’s not an easy analysis. Can cloud save you capital $$$:

Yes, as long as you understand what your business needs
Yes, as long as you understand what to/how much to procure from the cloud provider
Not 100%, if you are burdening your financials with large underutilized systems and everything it takes to get rid of these.
Is the cloud market size, age and efficiencies there to compare it with the perceived benefits that we see with large scale power providers and retailers, I would say not yet.
David Ibarra
Thursday, April 23, 2009 2:25:57 PM (Pacific Standard Time, UTC-08:00)
David, thanks for your thoughts. I agree with your analysis that cloud services efficiency will continue to improve. McKinsey pointed out that enterprise efficiency could improve as well. Both are true in my view -- I see it the same way. The question we should be asking is, at a fixed point along this evolutionary path, which would be more efficient? I'm arguing that having a larger population of heterogeneous users and workloads allows more possibilities to optimize. And large populations allow more to be invested into both R&D and supporting infrastructure. With all else constant, a high scale cloud offering has more optimization possibilities.

I understand that some workloads are easier to move to utility computing but, if the decision were to be based solely on cost with access to like technology, its hard for me to imagine why the cloud offering wouldn't be more efficient.

--jrh
James Hamilton, jrh@mvdirona.com
Tuesday, April 28, 2009 12:33:05 PM (Pacific Standard Time, UTC-08:00)
Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.[1][2][3][4][not in citation given] Users need not have knowledge of, expertise in, or control over the technology infrastructure "in the cloud" that supports them.[5]

The concept incorporates infrastructure as a service (IaaS), platform as a service (PaaS) and software as a service (SaaS) as well as Web 2.0 and other recent (ca. 2007–2009)[6][7] technology trends that have the common theme of reliance on the Internet for satisfying the computing needs of the users. Cloud computing services usually provide common business applications online that are accessed from a web browser, while the software and data are stored on the servers.

The term cloud is used as a metaphor for the Internet, based on how the Internet is depicted in computer network diagrams, and is an abstraction for the complex infrastructure it conceals
Jawaid Ekram
Tuesday, April 28, 2009 12:37:21 PM (Pacific Standard Time, UTC-08:00)
The basic problem with McKinsey Report is on how they went to define the term “Cloud Computing”. They took a very narrow definition of “Cloud Computing” and wrote the whole analysis based on it and went to declare the “Cloud Computing” is not beneficial for enterprise.

A better definition of “Cloud Computing” is on Wiki which says:

“Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure "in the cloud" that supports them]

The concept incorporates infrastructure as a service (IaaS), platform as a service (PaaS) and software as a service (SaaS) as well as Web 2.0 and other recent (ca. 2007–2009) ]technology trends that have the common theme of reliance on the Internet for satisfying the computing needs of the users. Cloud computing services usually provide common business applications online that are accessed from a web browser, while the software and data are stored on the servers.

The term cloud is used as a metaphor for the Internet, based on how the Internet is depicted in computer network diagrams, and is an abstraction for the complex infrastructure it conceals”

If you ask a CIO or CFO about “Cloud Computing” their definition it would include applications provided as a service. These enterprises are adopting the cloud services in masses. What we are seeing is that the horizontal needs are getting filled first (Web Conferencing, Email, Filtering, Archiving, Sales and Support System). However if we limit the definition of cloud computing to virtualized resources then it is a mistake.

CIO and CFO are more interested in solving business problem and their focus is not in analyzing this problem as layers of ISO stack.
Jawaid Ekram
Thursday, April 30, 2009 6:38:21 AM (Pacific Standard Time, UTC-08:00)
James,

In your post you state: "In the enterprise, most studies report that the cost of people dominates the cost of servers and data center infrastructure. In the cloud services world, we see a very different trend. Here we find that the costs of servers dominate, followed by mechanical systems, and then power distribution (see the Cost of Power in Large Data Centers)."

This sounds correct, but it would be very helpful if you could cite any of those studies that show the cost of people dominates in the enterprise data center.

As always your insights are terrific.

Thanks,
Rod Hodgman
Rod Hodgman
Thursday, April 30, 2009 11:20:15 AM (Pacific Standard Time, UTC-08:00)
Great post James. I have to congratulate you on the comprehensiveness of your posts.
I guess the bottom-line is that how should CIOs start prioritizing the investments with respect to Cloud vs. virtualization-to-get-most-out-of-their-data-centers. I think it is not a binary & mutually exclusive decision. It is more of a prioritization question IMHO.
Any large enterprise move to Cloud in the future would be measured & step-by-step and that will happen for sure but in the meantime I think everyone should start getting more focused on virtualization as well. The longer term bet is definitely cloud, so if I were prioritizing I would look at my company's situation and decide on short term gains (from virtualization) vs. long term strategy/value (of moving to cloud) and decide.
Having said that, I believe it should be a subjective decision.
Sunday, May 03, 2009 6:56:38 AM (Pacific Standard Time, UTC-08:00)
James, this was a great blog, 10% utilization. Wow! Now you can realy help me. If a server farm is 10% utilized and we can bring it up to say 30-50%, with cloud services, would we get to the point that servers don't need to be replaced because they get frationally more efficient? How much would this slow down their operations. How much energy would it save? I'd like to get us to the point where servers last 10 Years. Is that possible while still being efficient. COuld we just push up our utilization instead of replacing servers? Yes, I understand that the change to cloud will be tough, but can it be done? Is there any other way to push up the utilization without cloud? Can they just run less servers?

Oh, by the way. I'm also a power plant guru. (this is where some of my technologies come from) Power plants overall price per kW is dependant on two main influences, cost of fuel and the efficiency of the plant. Power plants in general are very inefficient and waste heat and produce more emmisions than they need to. But, since the sell price of utility power is so low power plants won't change their operations and designs as there's really no ROI. The scale of the operation has little to do with there cost effectiviness. Yes, I could take any typical power plant and increase it's efficiency by 26%. But no cost effecctive way to. Cheap bastards!
Tuesday, May 05, 2009 7:38:22 AM (Pacific Standard Time, UTC-08:00)
The demonstration that economies of scale can be huge is strong enough, but a key question that remains is how much of these savings are cloud operators such as Amazon going to pass on to end users. Since it is a business with relatively little competition and customers may find it difficult to move away, it is not certain that savings will be passed on all that much, and in particular the famous Moore's law that still applies to hardware that you buy might not apply to cloud computing, as a number of bloggers have pointed out.
Patrice
Tuesday, May 05, 2009 8:48:33 AM (Pacific Standard Time, UTC-08:00)
My definition of a good business is one where the provider can give their customers great value while simultaneously being profitable and continuing to invest in new features and increased efficiency. Businesses with these characteristics attract both large numbers of customers and large numbers of competitors. Providers compete on efficiency and both their ability and willingness to pass on savings to customers. The right thing by this measure is happening today in cloud computing and I expect that this will accelerate looking forward.

James Hamilton
jrh@mvdirona.com
Wednesday, May 06, 2009 8:33:51 AM (Pacific Standard Time, UTC-08:00)
Was this latest comment meant to be an answer, James ? I can't figure out how it answers the question of whether savings will be passed on to customers. And it's a key question, because obviously knowing that my provider finds great economies is not all that interesting for a company. Indeed, you might say "as long as it sells at the current price, why make it any cheaper, it is proof that there is sufficient economies passed on", and this is very true as well.
Patrice
Wednesday, May 06, 2009 9:23:14 AM (Pacific Standard Time, UTC-08:00)
Patrice, my comment was indeed intended to be an answer. I don't know your provider so I can't speak for them. You need to ensure you are getting better value than running workload in house or with a competitive cloud service provider.

James Hamilton
jrh@mvdirona.com
Comments are closed.

Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.

Archive
<April 2009>
SunMonTueWedThuFriSat
2930311234
567891011
12131415161718
19202122232425
262728293012
3456789

Categories
This Blog
Member Login
All Content © 2014, James Hamilton
Theme created by Christoph De Baene / Modified 2007.10.28 by James Hamilton