ARM Server Market

Qualcomm Inc. signage is displayed outside the company's offices in La Jolla, California, U.S., on Tuesday, Aug. 23, 2011. Qualcomm is the biggest maker of mobile-phone chips, and also owns some of the technology used in advanced, third-generation wireless services. Photographer: Konrad Fiedler/Bloomberg via Getty Images

Microservers and the motivations for microservers have been around for years. I first blogged about them back in 2008 (Cooperative, Expendable, Microslice, Servers: Low-Cost, Low-Power Servers for Internet-Scale Services) and even Intel has entered the market with Atom but it’s the ARM instruction set architecture that has had the majority of server world attention.

There have been some large data center deployments done on ARM servers and the ARM server market entrants have been many. At one point, I knew of nine semiconductor companies that were either in market or going to enter the market.  Currently the biggest players in this nascent server market are AppliedMicro and Cavium. The most recent AppliedMicro part targets higher clock rates but with lower core count whereas the Cavium Thunder delivers a somewhat less powerful core part with 48 on die. AMD is also targeting the ARM server market.

Over the last 2 years, the ARM Server group of companies targeting ARM Servers has thinned somewhat with Calxeda going out of business and Samsung electing to leave the market. But Cavium, AppliedMicro, and AMD are still committed and Qualcomm recently did a press event essentially saying “this time we really are serious”.  Qualcomm is particularly interesting in that they produce some of the best and most used mobile systems. And, for those that know them well, they have hired a good server team and there is no question that the team is capable of delivery, the company can fund a successful market participant, and they have a long history of proven custom CPU designs based upon the ARM architecture. But considerable time has passed and, although Qualcomm continued to talk about ARM Servers and continued to work on ARM Server designs through this period, they never really seemed any closer to delivery.

The press event last week is the public signal that Qualcomm really are serious about this market and they made the commitment real by announcing an initial 24 core server. This new 24-core server CPU is already sampling.  I’m personally not super excited about this initial Qualcomm offering but I am excited to see them now in market and committed to server-side success. They have a high quality, dedicated team focused on producing compelling server chips and they plan to do a custom ARM design. They report the next version is well along, and from all I have seen so far, it’s real and will be a performer.

In the past, I’ve been quoted as disappointed in the pace of ARM server innovation (e.g. Amazon Engineering Says ARM Chips Lag Intel in Innovation). To a large extent, that remains true today but here are several important factors emerging that together stand a good chance of making an ARM server a real market competitor:

  • Volume Economics: Qualcomm is the CPU supplier for many of the world’s leading smartphones and, overall, ARM powers just about every mobile phone on the market. We know all the big server architectural changes where driven by superior volume economics coming from below.  Intel beat the UNIX Super Servers on the strength of the R&D stream funded by the Intel client business.  ARM and all of ARM Partners including Qualcomm, Samsung, and literally 100s of others all gain from these volume economics.  Historically, the volume client part producers eventually win in server.
  • Process Technology: One of the most challenging aspects of Intel as a competitor is they are competent in overall semiconductor design. They produce good parts. And they are good at market development so the ISA is well supported and has all the important applications. All of these factors are important but where Intel has been positively unstoppable is in process technology. They have been at least one full process generation ahead of the rest of the industry. A full process node better means Intel has more processor real estate to work with, or can get higher yields with a smaller part, and has a power advantage as well. Intel’s dominance has been driven by the massive market share driving the highest semiconductor R&D budget in the world. But, the mobile world has changed all that.  This year both Samsung and TSMC are outspending Intel in semiconductor R&D and it is not expected that Intel will retake the top spot in the near future. TSMC makes Apple mobile and Qualcomm CPUs and is, by far, the largest FAB-as-a-service provider in the market.  TSMC is huge, Samsung is spending even more, and both are investing Intel.  More R&D doesn’t necessarily translate to better results but all indications are that all three competitors are going to deliver the 7nm process node at about the same time. It looks a lot like the process technology generational advantage that Intel has enjoyed for years is being eaten up by the massive mobile and embedded part ecosystem R&D investment.
  • Vibrant Ecosystem: ARM has literally 100s of licensees all starting with the basic ARM design and adding proprietary intellectual property, additional functionality, or packaging differently. Samsung, Apple, Qualcomm and all the rest all have the advantage sharing the initial IP investment and, perhaps more important, all have the advantage of a shared tool chain with rapidly improving compilers, Linux distros, and other tools all supporting ARM. Most of the Internet of Things market bigger than microcontrollers will be ARM based. The Raspberry PI is ARM based. The Android devices I used every day are all ARM-based (by Chance, all my mobile devices use Qualcomm CPUs but my Raspberry Pi is Broadcom powered.
  • China: China is “only” a single country but it’s hard to talk about a country with more 1.3B people and use the word “only” in the same sentence. China has massive influence and, today, several of the major high scale Chinese infrastructure providers are deploying ARM servers aggressively. The Chinese market alone could fund the needed R&D stream to produce good server parts.
  • Cloud Computing: More and more of the server processor volume is being purchased by the large cloud computing providers. Even though it’s still very early days for cloud computing, all the big players have 10s of data centers, with 10s of thousands of servers in each. The big players had a million servers a long time ago and continue to deploy at a staggering pace. AWS, as an example, just announced over 80% usage growth over the last year on what was already a massive base. This changes the server market dynamics in that the cloud providers are willing to take on big challenges and do buy in enough volume that they can create an tool and app ecosystem nearly on their own. Interestingly, Google, Microsoft, and Amazon are all ARM licensees. The cloud changes what is possible in the server market and the vast size of the major cloud deployments make many of the big changes previously thought impossible seem fairly practical.

Clearly Intel still makes the CPU behind more than 90% of the world’s servers (even when taking a very generous interpretation of server). And, just as clear, Intel is a very competent company that has in the past responded quickly to competitive pressure.  Intel has also gotten very good at working closely with its major customers and, unlike the bad old days, is actually very good to work with. I’m more impressed with what they have been bringing to market than ever. Nonetheless, there are factors that make it very likely that we are going to see some very good server parts based upon ARM in market in the near future.  It’s hard to predict the pace of execution of any of the participants nor where this will end up but, generally, change and competition is good for the industry and great for customers.

I’m glad to see Qualcomm serious about the server CPU market, it’s good to see their first market entrant already sampling, I’m excited by what the next version might deliver, and it’s good to see the ARM-based server parts out there and the investment ramping up. Remember when server-side computing was boring? :-)

More on the Qualcomm server project:

45 comments on “ARM Server Market
  1. Eric G says:

    James thank you very much..very gracious of you to take time to respond….very much appreciate your input

  2. Eric G says:

    Thanks James for the thoughtful answer. I appreciate your graciously taking the time to respond. I appreciate public cloud economics, like for like, remain better than on premise, the strong innovation at scale being brought by AWS (quite impressive), reliability, and nowadays security advantage of AWS/public cloud. That said, I speak to many experts that are very concerned about the slowing of Moore’s Law and its impact on cloud economics. Transistor cost are not coming down as dramatically as they used to. I appreciate AWS has good volume economics but when you don’t get the historical performance increase curve then isn’t that cause of concern if this trend does not reverse (maybe does not impact AWS?). You also talked about “networking is in my way”, part of it was absence of Moore’s Law, part of it presence of proprietary vendors, which you seem to be circumventing. do you see that helping. NAND cost could increase as cost of 2D NAND stall and industry slows down 3D NAND transition. cost of optics may come down hard at 100gb but not there yet. Likewise DRAM cost could increase given Moore’s Law. I know you are extremely knowledgeable about all these topics. is it right to be concerned about all of the above and their impact on cloud economics? thank you

    • Your question is essentially, do I agree that Moore’s law is decreasing and what will the impact of this decrease in downward velocity have on the cloud economic advantage expereinced today. Starting from the latter part of your question, what is the impact on cloud economics, there really isn’t any impact. The cloud is less costly than on premise deployments on the arguments I made below. These arguments are based upon volume economics, 10^6 unit buying power, elimination of expensive distribution channels, using workload optimized designs, hardware specialization, and custom engineering without promissing shareholder “enterprise” profit margins. Those advantages appear to persist whatever the speed of Moore’s law.

      Moore’s law has driven much of the growth in our industry. As prices decline, the problems that can be economically solved using computers get much bigger fast. Much of the growth in our industry has been fueled by Moore’s law so there is no denying that it’s important.

      Moore’s law was originally stated as the number of transistors in dense integrated circuits doubles every two years. But the popular press and many in the industry have generalized the term to mean that all things in computing get cheaper exponentially fast. And, as sloppy as that wording is, for the most part, it’s been directionally correct over the years. Most things are getting cheaper fast. Memory, CPU, Disk, Flash storage, etc. all show Moore’s law trends. On the other hand, many components that make up a data center are deffinitely not based upon transisters. For example, high voltage switch gear isn’t really on a transister based cost model although the control systems that run them clearly are microprocessor based. Components like these seem some price declines from higher volumes and a more competitive market place but little gain from Moore’s law. There are lots of price declines in our industry that have nothing to do with Moore’s law. They are just volume economics, the impact of innovation, and the effects of a competitive marketplace.

      But there is no question that the dominent cost in information technology are servers and the dominent costs in servers are all very much on a Moore’s law pace so it is absolutely the case that Moore’s law predicts much of our industry even though it doesn’t apply to all components used in our industry. And there is no question that rapid price declines drives much of the growth in our industry so it is worth uderstanding whether or not these trends will continue.

      You gave some examples of concerns you have on potential increasing costs. Specifically you asked if “NAND costs could increase as the the cost of 2D NAND stall and the industry slows down during the 3D NAND transition”. 3D NAND is already producing better yeilds and lower prices so it appears that this will not be a problem. We expect the declined to accelerate through this transition.

      Another question you asked was “DRAM cost could increase given Moore’s law.” DRAM are expected to continue to fall but there is no question you are correct that the rate of density improment is expected to go down and so we will likely enjoy a slower rate of improvment. Long term, new memory technologies appear to be the solution and many are showing great promise in labs today. Short term, there is progress that can be made by adding more agressive error correction to lift yeilds further but yeilds are already fairly high in mature memory technologies. I think we will see continued but incremmental price improvements for the near term and, long term, a technology change will yeild a step function change.

      Processors have hit several walls. The power wall made continuing to scale frequency as a primary tool was no longer effective. Consequently core counts were grown instead and much focus has been made in improving core efficiency. Proding multicore performance isn’t as useful nor as general as growing single thread performance at the same rate but the industry continues to adapt and we continue to get better value from processors. A transition has been under way for the last 5 years where more and more key operations end up brought up on core as hardware accelerators. Hardware is 10x more power efficient and can require as little as 1/10th the latency. I expect that a big part of future processor gains will be in special purpose accelerators. Just as obvious wins like floating point and graphics were added years ago, crypto and compression more recently, I think we will continue to see an increasing array of different hardware accelerations. These are less useful in the pure sense that cores and frequency in that most apps won’t use many of the accelerators but we will continue to see gain even without as much help from Moore.

      My sumamry is that cloud economics aren’t based upon Moore’s law so, in the most basic sense of the question, it doesn’t matter. But industry growth is based upon Moore’s law and it is the case that it is slowing. Generally, I’m very optimistic that while I believ many of the less extreme predictions on the decline of Moore’s law, I remain very optimistic about continued price reductions. Human innovation and volume economics do wonderful things. Thanks for the interesting question.

  3. Eric G says:

    James, many companies would say a big hurdle to moving legacy bandwidth, storage and compute intensive workloads like big databases is the cost of running them at AWS could overtime trump the cost of running them internally. While AWS is clearly more efficient cost wise on compute and perhaps storage, some apps will require no latency, redundancy and critical back-up capabilities that could jack-up the bill for them at AWS. first is this assertion true or not? what is the most important element in the input cost for AWS (compute, networking, storage, power) that you think if it followed Moore’s Law would allow AWS to offer superior cost equation on these mission critical apps? thank you

    • Thanks for the interesting question Eric. It’s based upon a few underlying assumptions that I don’t fully agree with so let’s start by talking through those. Then we will look at your final question that I’ll paraphrase as “what is the most important AWS input cost that would allow it to offer superior economics even for mission critical applications?” First the assumptions that I’m arguing aren’t usually true:

      *Companies know what they spend on Information Technology: This is actually incredibly hard to figure out in most companies. The data center is usually on the books of real estate or the facilities team. Power will be paid for by facilities mixed in with the consumption of manufacturing plant and offices. The procurement team responsible for negotiating with Cisco, EMC, and other enterprise system vendors are often the same procurement team that does everything else for the company in question. How much is spent on IT procurement is hard to know. The same is true for the legal team. Each company I talk to gets huge value from the applications on which they operate their business. It’s hard to tell what is spent on these high value applications that are mandatory to run the business versus what is spent on the undifferentiated heavy lifting below these applications. Most companies know they need IT to cost effectively deliver their products and services but they usually don’t know what portion of that cost is infrastructure vs high value application. It’s very hard to figure out what is really spent on infrastructure and it is easy to get confused and think that the infrastructure is less than it really is. Most companies under estimate the cost of the infrastructure and overestimate the value they get from a unique investment in infrastructure.

      *Companies make IT investment decisions on the basis of best value: For most companies, they certainly would prefer to spend less on information technology but, in the end, their business is worth so much more than the cost of infrastructure that the IT spend is relevant but really not anywhere close to the most important cost decision they make. For most companies IT is a required component of success but nowhere close to the largest cost component. As a consequent many companies are not really all that cost sensitive and instead will focus on other metrics first. If the IT systems aren’t working, it’s very expensive and possibly destructive to the business, whereas if IT cost 10% more, it’s only inconvenient. Cost of IT is not usually the most important factor in IT decisions at the most successful companies. And, if you think about this as a share holder or as a customer, it makes perfect sense. Sometimes companies make poor IT purchasing decisions on the basis of cost not mattering. Clearly this is a mistake but it is true for most businesses that costs are not the most important measure when it comes to IT decision.

      *On premise deployments are more reliable for mission critical applications: It’s easy to think you need very expensive IT equipment to deliver a reliable application but that is far from the most important factor. It is easy to find very high cost hardware where the premium is justified on the basis of reliability or availability and it is true that some of this equipment is not usually available in the cloud. The obvious conclusion is that most mission critical applications can’t run in the cloud. However, if we dig deeper, we find the most reliable applications in the world are built using multi-facility redundancy. It’s really the only way to get that last nine. Even if you spend the silly high costs to buy the very most expensive equipment, you still need to architect the application for mutli-facility redundancy to reliably achieve mission critical availability levels. The same levels of availability can be delivered on premise or in the cloud. The key difference is that the cloud model allows very high reliability architectures to be affordable to companies too small to normally have multiple data centers in a region.

      *On premise deployments are more secure: On the face of it, its seem natural to think that a car company, for example, would be able to secure their assets more thoroughly on premise rather than “somewhere else”. But, it’s often not really the case. The high risk threats are the same on premise or in the cloud. The largest threat is internal attack in both cases and appropriate steps to protect against this threat need to be taken in both cases. The bad guy threat from viruses and targeted attack is also the same in both cases. On systems where air gapping the entire application is not practical, they equally vulnerable to external attack whether on premise or in the cloud. The additional brick wall in a private data center isn’t really relevant to most of the external treat vectors. Where the protections come from is many layers of isolation, application hardening, combined with intrusion detection and prevention systems. An effective application security program includes tracking bad guy activities and evolving the protections in lock step with advances on the threat side. It’s an arms race and few protections are absolute so protections come from having deep investments in security and being very current on the threats out there and have appropriate mitigations in place. At scale, an organization like AWS can hire literally 100s of the worlds best in different areas of security. It’s absolutely a silly large expense in absolute terms but, relatively, amortized over one of the worlds largest server deployments, the incremental cost of this deep security investment is trivial. For many small and medium sized businesses it’s impossible to justify this level of spending. It doesn’t scale down well. For large businesses, they certainly can justify this level of expense and it’s appropriate that they make it but, if they chose to build on a secure cloud infrastructure with an existing deep investment in security, they could do more with their existing security investment and run at higher effective security levels. The vast majority of companies are under invested in security and could make a substantial step forward in security by leveraging the massive security investment in place at most responsible cloud providers.

      *On premise latency, redundancy, and critical application backup is more appropriate for mission critical applications: There are companies that sell gold plated hardware into very high value application domains. When what you are doing is really, really important it seems crazy not to pay for every protection offered in the market. The easy conclusion to arrive at is these gold plated systems are what delivers the high availability these companies are delivering. If you can’t get the same gold plate available on the cloud hardware then it can’t be as reliable. This is essentially the “you get what you pay for” argument. Sadly, in enterprise hardware and software, you pay what you pay and get what you get but it’s absolutely not the case that resultant availability is functionally related to cost.

      When I look at the very highest availability deployments at super competent customers I’ve worked with over the years, some interesting parallels emerge. All these customer are doing mulit-facility redundancy. They know the only way to get the last nine is write an application that spans multiple data centers. Each of them have complex application specific failover semantics. Most are exploiting knowledge of the application domain to give cross-region redundancy without paying the cost of cross country latencies. Some of these companies do all of this on the very most expensive hardware available and this works fin. Others chose to do everything right from an application architecture and operations perspective but chose not spend on most expensive hardware. What’s interesting, at least for me, is in very nearly every case, it’s the application work that gives the availability rather than the expensive hardware. The companies that get this right can deliver very high availability with or without the gold plated hardware. The companies with gold platted hardware that don’t invest deeply in application availability, will actually get some value from gold plated hardware but they will never get anywhere close to the availability levels of those that invest architecturally in application level availability. And, once you invest properly in application availability, the gold plated equipment doesn’t contribute additional value. You can’t run on junk and you would never want to depend upon operationally weak providers but once both of those are good, there is no value in gold plating the hardware or the infrastructure. I’m sure there are exceptions but 100% of the high availability applications with which I’ve been involved could have been delivered equivalently on premise or in the cloud.

      if you look at Google,, Facebook, and Netflix, you will see 4 very high value brands. In each case, the company has a sufficiently valuable business that any of the 4 could chose to deploy expensive enterprise hardware. Each company has earned a reputation of delivering an incredibly reliable service. All 4 are written to the cloud model of multi-data center redundancy and all achieve reliabilities levels that most enterprise IT shops would kill four. These 4 are interesting because their customers demand and get availability. The shareholders demand an IT availability levels that never blocks the business. And yet all 4 companies are built on cloud hardware (no fancy nameplates and paint) and two of them are actually built on AWS infrastructure.

      Some of the worlds most reliable and available systems are built on cloud technology today. And, those enterprises that chose to host their own applications on premise, will use the same multi-data center redundancy models to achieve that last 9 of reliability. Cloud deployments are no less reliable and available and, the application techniques needed to deliver the last nine are identical on premise or in the cloud. The hard work needed to achieve 5 9s are the same in both models.

      Having looked at the some of the assumptions behind your question and argued that cloud is already hosting very high availability applications and many companies are already 100% cloud deployed, let’s look at your specific question: what is the most important AWS input cost that would allow it to offer superior economics even for mission critical applications?

      The economics have been excellent since day one. I remember when Amazon S3 first became available. When it was first available, I was leading a successful cloud service that depended upon multi-side redundant storage across different data centers in widely separated locations using gold plated gear. The cost of storage all in including the multiple data centers and the redundancy was $26/GB. As I recall S3 was charging $0.15/GB. Having been intimately involved with both solutions, I can say that S3 has delivered higher availability and better data durability than the two datacenter solution based upon gold plated hardware that cost 2 orders of magnitude more.

      All costs have fallen dramatically since that time but cloud economics remain notably better than on premise. On premise customers buy from vertically integrated networking systems providers with publicly published profit margins. At scale in the cloud, merchant silicon ASICs are combined with custom router designs that get built by low cost, high volume ODMs. Most networking companies charge at least 15% annual “support” in addition to already high pricing. At scale, using merchant silicon switching ASICs, custom designed routers, and internally maintained protocol stacks, is much more stable, far less expensive, and evolves more quickly.

      Looking at server hardware, it is the biggest part of the cost of offering compute and storage. Large OEMs sell server hardware to distributors that sell down through the channel to end customers. This distribution model adds 30% to the cost of hardware but it’s necessary when there are 10s of thousands of customers. Cloud customers use custom server and storage designs, get it built by low cost, high value ODMs, shipped directly to the data center without an expensive hierarchy of resellers. There is no value in a complex and expensive hierarchical distribution system when there are only a handful of top tier buyers. In fact, cloud providers purchase in sufficient volume that components like processors, memory, disks, SSDs, and NICs are all purchased directly from the component manufacturer in high volume package deals. Some cloud providers have their own semiconductor teams for some embedded parts.

      Large cloud providers build many data centers every year and, as a consequence, get to iterate rapidly on designs. The big players build more data centers in a year than most companies will build in their entire operating lifetime. Doing anything at volume allows optimization, allows specialization, and ensures a very short cycle time on new design ideas.

      For most companies, their very best engineers and leaders are hired in to work on key parts of the specific business they are in rather than IT. It’s pretty rare that the next leader of the company is going to come from the IT team. That’s just not where most companies focus. At cloud providers, that’s all we do and 100% of our best people don’t do anything other than try to do a better job of that. That is where are best leaders and engineers work and the next leader of the company will almost certainly come from there.

      I covered many of these points in more detail in the 2013 re:Invent talk on AWS infrastructure available on youtube but, the short version is above. As we walk down through each technology investment areas and compare the efficiencies, focus, innovations, and economies of scale available to a cloud company running order 10^6 servers, it’s hard for me to imagine it being done less expensively on premise.

  4. Rich says:

    And one more add to the above post..

    Intel was transparent at this year’s Investor Event that they make custom server CPUs for most of the major Cloud Service Providers. While I am certain this enhances value to each Cloud Service Provider it creates classic Market Entry challenges ala Ford Model T (..comes in Black..) vs GM (..multi-brand..)

    The implication is that one part won’t serve all and possible implies that this is far more complicated than simply Flexibility vs Specificity.

    Thanks again!

    • Rich, you were asking about the value of specialization vs flexibility (generalization). When you have 3 of a server, you can’t afford to specialize and the market ends up served by general purppose servers. When you have 10s to 100s of thousands of a server type, you can’t afford not to speciallize. Clloud and and hyper-scale providers do a lot of hardware specialization because, at scale, the gains are very material.

      Intel is wise to cater to this market. It’s my opinion that hardware workload acceleration is going to become the primary differentiator in servers. As processor feature size continues to decrease, the processor real estate is available and server processor designers are going to have the choice of adding hardware workload accelerators or working to add a bit more general purpose acceleration. Even though most accelerators will not be used by most customers, some acceleration will be used by almost all. Few customers will use it all b ut allmost all customers will use some. These “barely” used hardware accelerators will provide upwards of 10x perf improvement when they are used so, with a broad enough selection of accelerators, they end up providing more bang for the die real estate than adding a bit more cache or cranking the cycle rates slightly higher.

      My take is that carefully selected hardware acceleration of key software kernels will end up being the primary differentiator in server processor design.We see this begining to happen with crypto engines, erasure encoding, and other accelations ending up on die. I think this will be a big part of the future.

      • Mike Wilhelm says:

        The 2nd stepping stone for Arm would be servers running custom application stacks.

        You have said that first and foremost Arm needs to create sufficient value measured by price / performance, and discussed the likely role the new interconnects will play to achieve this.

        What else does Arm need to worry about here? What else can prevent them from taking material share in this segment?

        • What’s needed for ARM servers to provide competitive price performance? Higher performing cores and more of them is the simple answer. The first needs either more frequency on a future higher performing core and most likely a combination of both. The latter needs a high scale, coherent memory interconnect.

  5. Rich says:

    Howdy again James.

    I see/accept all the logic in your conclusions that large Cloud Service Providers are critical to ARM success in Servers.

    This said, it appears that some large Cloud Service Providers have architectures more optimized to Flexibility (rack/cluster runs any/all services, general purpose) while other large Cloud Service Providers have architectures more optimized to Specificity (rack/cluster optimized for specific/limited services, specific purpose). I am sure their are valid reasons for this but haven’t seen any great public articles on the motivators.

    I sense ARM will have entry/opportunity more in the Specificity space rather than the Flexibility space. Traditional Enterprise tends toward Flexibility model too. Thus ARM entry is limited to a finite, limited few key Cloud Service Providers with the implication that obtaining Critical Mass is even harder.

    Additionally, since Memory Interface is so critical it appears Intel knows this too and this was probably some/one of the motivations for their NextGen Memory positioned between DRAM and NAND too.

    Simple point being that both technical and commercial challenges exist at this point to challenge building Critical Mass.

    As always great blog post and even better blog comments!


  6. Mike Wilhelm says:

    James, a couple of questions on your comment that the market would really take off if one of the big 3 cloud providers were to offer a really low cost Arm server.

    Why does it have to be a cloud provider rather than an OEM or ODM?

    Why would they want to do this?

    • OEMs and ODMs certainly can offer ARM servers. In fact HP has offered ARM servers for quite some time. The problem is nobody wants them. Partly because this generation are not great price/performers and partly becuase end customers don’t really want to buy a server that is incompatible with their entire application stack and still not incredibly good value.

      If a cloud provider offers an ARM server, the cloud provider will manage the entire system software and tools stack so that complexity is removed from customers. The server app ecosystem is still mostly missing but everything else is there. Cloud providers have many customers that are using custom application stacks and these actually can be ported fairly easily to the the new systems and, if they are better value, some customers will do it. If enough do it, there is critical mass and app providers will do it as well. Once that starts, the momentun can accelerate.

      My take is that an OEM or ODM can offer ARM solutions but most enterprise customers won’t care and won’t use them. HP has already proven this. Success is only likely to come from one of the major cloud providers or a mega-scale company like Tencent or Facebook.

      Why would a cloud provider want to do this? To offer better value to customers. They reason why none are doing it yet is that the current generation of ARM serves are not yet producing better value. The factors I point to the article suggests that this will change and gives some of the reasons.

      • Mike Wilhelm says:

        Can Arm build an initial beachhead in storage with the parts that are in market?

        Both EMC and HP have said that they are doing an Arm storage server.

        • Yes, there will be many storage servers done with ARM and there are many out there right now. Annapurna, an Amazon company, supplies ARM based SOCs for many consumer storage companies. Here’s an example:

          There are also server storage systems built using Annapurna SOCs:

          ARM is a great choice for storage but the bigger market is server processors. Storage can be used to drive some volume and that volume can be used to support the R&D to produce very nice server processors. But, it works the other way as well. If you are supplying a lot of server processors, you could chose to offer very favorably priced storage servers.

          My read of the market is that targeting storage and not the server market as well isn’t a good long term strategy — not enough volume and too open to being undercut by the server processor vendor. It’s too easy for the high volume server processor vendor to offer low end server parts for storage and othe embedded applications like network router control processors and win those markets.

          I agree that storage is a good stepping stone to winning the broader server market but, in my view, if a part isn’t good enough to win at least in some server market segments and only can compete in storage, it’s very open to being displaced in the storage market by the server procesor vendor. Doesn’t seem like a stable, long term strategy but it is what is happening today.

          • Mike Wilhelm says:

            Should I also read it as the data center operators considering it to be a stepping stone for them towards supporting Arm and then adopting that great server part when it arrives?

          • Mike Wilhelm says:

            Can you also please elaborate on your comments about China’s aggressive investment? What are they doing differently and why?

            I have heard that there is a government mandate to get off x86, and the sense I have is that they may be willing to overlook the current math in favor of national considerations and their expectations of the future math. Are they really this committed, or is it at some lower level?

  7. Mike Wilhelm asked why coherent interconnects are needed to build successful ARM servers. Servers are essentially processors, memory, and the rest. The processor delivers the value while the memory and overhead is needed to allow the processor to deliver the value. Low performing, low core count processors have to pay for almost the same amount of overhead as a high performance, high core count processors. Low performing processors can often use less memory but there is a certain minimum amount of memory needed to hold the operating system and the problem being worked on. Both fast and slow processes need this same minimum amount.

    The challenge for low performance and low core count servers is they end up amortizing the memory and system overhead over a small number of cores. So, taking an extreme example, if you take a 4 core, 2 ghz ARM part and try to build a server, it will end up not being a great price/performer even if the processor was free. You need enough performance to pay for the memory and server overhead and, without that, free processors are simply not cheap enough.

    One solution is to stick with the wimpy 4 core part but turn the frequency up super high to get enough performance to cover the memory and overhead costs. But, it’s very difficult to get enouhg performance at acceptable power and heat levels to actually produce a winnning price performing server.

    The solution, as in many things in engineering, do both. Turn up the frequency and, at the same, time add more cores. When the core counts gets sufficiently high, a 2.5 to 3.0Ghz part can produce quite respectable price/performance for many workloads. It still won’t win the single thread performance test but, for many workloads, it’ll produce the best price performing solution.

    In order to have many cores under the control of a single operating system, there needs to be a high scale, memory coherent interconnect that gives all those cores access to the same memory space and to see all changes made by any of the cores. The interconnect is the key to producing good price/performing servers and it’s especially important when the individual cores aren’t necessarily significantly faster than the alternatives.

  8. Stephan says:

    Very interesting article. Thanks James!
    Do you think there is a lack of tools in the ARM space comparing to x86? Products like vTune and Intel compiler are not available for ARM and it might be a blocker for companies if they want to switch.
    What do you think?

    • You raise a key point Stephan. Tools make a difference and the Intel vTune and compiler are excellent tools. But, I would argue that only a minority of Intel server users end up using these tools. Most haven’t heard of them even though they are very good. The key is the massive application stack that runs on intel architecture today. Customers want Oracle, SAP, and the vast sea of veritical applications like crash simulation software used in automotive design. Getting the 1000s of important apps supporting the alternative is the hardest challenge facing the ARM ecosystem.

  9. Todd Warren says:

    I think the question for arm in the server market is much the same as the question intel had (and failed at) in the mobile market with a slight twist. Intel failed in the mobile market because they could not walk away from a pc centric peripheral power model without effectively losing most of their OS support. early atom suffered from this. ARM and its peripherals had a superior standby power model. with ARM in server, they will need a high bandwidth storage interface…what is the equivalent or support for things like NVME that enable high bandwidth to storage. Additionally; the right peripheral chips (or FPGA on a chip) to enable higher speed network processing in a rack. ARM vendors are used to differentiating on the auxilary cores on the die (gpu, mp3 decode etc.) and this sort of innovation will be possible as well if vendors see a large enough market. Things like ARM replacing MIPs in network equipment as well could speed that if it were to happen

    • Todd, I generally agree with your key point that it’s hard for a company that specializes in one area to really commit to do great work in another area. Intel did miss mobile and the ARM focus on mobile could cause them to not invest fully in server. Certainly, ARM has not moved as fast in server as they could have so there is no debating that point. A small quibble on a couple of details: NVMe works fine on ARM and, on networking, you can have as many PCIe lanes as you want. ARM parts supporting 25G networking are available today and, if you wanted it, 100G we could do it easily enough. I would argue ARM looks fine by these measures.

      Where more work would really make a difference in a hurry is a memory coherent high speed interconnect supporting O(100) cores and, closely related, an off processor coherent interconnect enable multi-socket server designs.

      You mentioned networking equipment. It’s a market that has been served by legacy ISAs like PowerPC and MIPS and it’s rapidly moving to ARM and Intel. The interest in software defind networking is making Intel an increasingly interesting control processor in routers and the rich ARM ecosystem is making ARM a good replacment for PowerPC. I expect most router control processors will move to ARM or Intel but this router market is fairly small compared to the server market so I don’t expect the router market to make a huge difference to how server market plays out.

      • Mike Wilhelm says:

        Does AppliedMicro’s new X-Tend product measure up to the type of memory coherent
        Interconnect that can “make a difference in a hurry” ?

        • There hasn’t been much released publically on X-tend at this point so it’s hard to talk about in detail at this point but we can talk about what is needed. What the server market needs is a low-latency, memory coherent interface supporting interconnecting order 100 cores and very large memories where near memory and far memory differen by much less than a factor of 2.

          Several companies including ARM have excellent interconnects either in market or in the lab so the solutions are deffinitely coming.

          • Mike Wilhelm says:

            Thanks for the reply James, and for the very interesting blog.

            Can you please elaborate on the changes these solutions may bring?

  10. I think this is the technology that will make the VPS market collapse. It will not make sense anymore to pay for a VPS when you can have a dedicated for very cheap price. even without microservers, there are dedis for 5 euros a month already.

    • You could be right Marcos but my money is the admin and security advantages of running virtual might still win out over bare metal especially with the virtualization tax continuing to fall.

  11. Jonathan says:

    Hi James,
    You mentioned Google, Microsoft and Amazon as ARM licensees. Are they making there own SoCs? (I know Amazon does through the acquisition of Annapurna). Do they have an architecture license?

  12. Nick Dengar says:

    Arm should buy AMD.

    • Nick says “Arm should buy AMD”.

      Are you sure? Buying AMD would put ARM in the selling chips business when they really want to stay in the selling designs and intelectual property. Futher complicating things, a great deal of AMDs current revenue comes from selling X86 processors to gaming companies. Just doesn’t feel to me like a quick way for ARM to win in servers.

  13. Anjan Bacchu says:

    hi there,

    nice article. I was excited about Samsung being in the ARM server market. Am a lil disappointed to know that they’re getting out.

    I’m a java Developer. When do you think that sun will release a high-quality JVM for ARM for Linux ? If JVM on Linux for ARM succeeds, I’m sure Windows on ARM will not be too far away. A lot of big-data workloads are on the JVM — hadoop, spark, storm and a lot of enterprise workloads are running on Tomcat, etc. So, if ARM server has any chances of going anywhere in the enterprise, then a good JVM is important.

    • Robertas Jasmontas says:

      There is already windows for ARM, you can run Windows 10 on Raspberry Pi2 :)

      • It’s true, you can run Windows on a Raspberry Pi but mine still runs Linux and it’ll likely stay that way. The widespread Linux support for ARM mentioned above is absolutely required to win server workloads. But, winning enterprise customers in volume still really needs Windows. It’s slowly changing but Windows remains a prereq for enterprise server adoption these days.

    • Anjan was asking about open source system software on ARM. Linaro has done a pretty credible job so it’s mostly all there:

      My take is the market would really take off if one of the big three cloud providers offered a very low cost ARM server. More uptake would attract more software and, once the cycle is started, success comes fairly quickly.

      Most of what you want on ARM is already there.

  14. Vibrant ecosystem must include software, especially open source software. ARM has a weaker memory model than x86 so there will be more concurrency bugs encountered in our code. Who is going to tune and debug the open source stack? Who is going to make ARM available to open source hackers who don’t have a lot of spare cash. I think the software side of the ecosystem has been ignored.

    • Rich says:

      ARM’s Linaro spin-off, Red Hat and others have long solved this. Linux and the whole open source stack just works, today.

  15. Sambaran says:

    Intel has further advanced into the uServer market beyond Atom. What do you think about Xeon-D? Some details are here:

    • Ed says:

      Exactly my thought as well. Xeon-D completed the whole Intel Lineup.

      As much as I love Intel to have some competition on the Server Market, I still can not see a single company that could compete.

      Before the Volume Economics and Vibrant Eco etc kicks in, you will need to make a chip that at least offer some advantage, be it cost, performance, power etc.
      And so far ARM camp doesn’t have one.

      Unlike the Phone market where you have 1B chips per year, Server market is at a much lower volume. And that also do not work in ARM’s flavor.

      But its still good to see somebody to push Intel making better product.

      • You are right Ed but remember, Intel server R&D is built upon the strong base of the client business. The core chip is done for client and then tailored for server. ARM has the 1B+ chips a year to fund great base processor R&D. This attracs good compilers, tools, and O/S distros. The client world supplies the core technology that is used in servers.

        The combination of ARM volumes and the FAB world catching up lays the ground work that allows great change. There sill needs to be a great server part in market for anything to happen. It’s probably also reaquied that one of the big three cloud providers makes an ARM server available.

        • Russ Taylor says:

          In the server market energy-reduction is key. With the volume of data that servers need to processing increasing 10-fold every 2 years, servers are overloaded. This growth in the number/size of data centres is unsustainable, even in the short term (e.g. 5% of the UK’s power is consumed by data centres). The huge operating cost of energy will kill many DC/Cloud operators and the only ones who will survive are those who conquer the energy-usage issue. For that they need the lowest power consumption for the highest computational performance. So who in your opinion is the best bet for producing the most efficient processor chip?

          • Russ, you may be right in your prediction that energy costs will kill many DC/Cloud operators. But, today that is not the case and it doesn’t appear to be a likely outcome in the nearterm. Power is absolutely important but nowhere close to the dominant costs of offering a cloud service.

            There is just about no way that data center power consumption is 5% of the UK power consumption. These numbers get bandied about and often come with scary numbers for the future but, so far, it just hasn’t been the case. 10 years ago, the entire US IT industry (telco, client, mobile devices, and data centerw as estimated to be aroud 1.5% of the total US power consumption and this number was expected to climb by by 2x in the following 10 years. We can now look back and see that, due to increases in efficiency, higher utilization, and cloud computing, the number is just about exactly the same after 10 years of growth: //

            Data center power consumption is a tiny fraction of the overall IT energy budget and most of the predictions of 2x to 10x growth haven’t come to be.

            Still, having argued that power is not the primary problem nor are the stories of doubling every year (or even every decade) accurate. But power is a big cost and very important to data center operators so I agree with you on that point Russ.

            Who has the most power efficient processor? ARM with their mobile heritage is the more efficient. Intel with their server and and desktop heritage does consume more power for a given amount of work done. I believe that either ISA can be made very power efficient — I don’t see a fundamental or long term advantage for ARM or Intel on power consumption. It just comes down to who can deliver the best overall price/performance and power/performance with price/performance being the more important of the two.Intel and many of the ARM licensee are very competent and I don’t see anything fundamental blocking any of them from delivering power efficient parts.

            But, in answer to you specific question, ARM does have better power/performance but somewhat worse price/performance for server workloads and a lot lower single thread absolute performance. ARM continues to look good on the former and, at the same time, is improving fast on the latter two measures so has a legitimate shot at winning share in the server market.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.