In November of last year, AWS announced the first ARM-based AWS instance type (AWS Designed Processor: Graviton). For me this was a very big deal because I’ve been talking about ARM based servers for more than a decade, believing that massive client volumes fund the R&D stream that feeds most server-side innovation. In our industry, the deep innovations are expensive and “expensive” only make sense when the volumes are truly massive.
For someone like myself that focuses on server-side computing, this is a sad fact. But the old days of server-only innovation started to die with the mainframe, and the process completed during the glory years of the Unix super-servers. Today, when I’m placing a bet on a server-side technology, the first thing I look for is which technology is fueled by the largest volumes and, most of the time, it’s the massive client and especially the consumer market that drives these volumes. For more than a decade, I’ve been watching the client computing and especially the mobile device market for new technologies that can be effectively applied server-side. The most obvious example is the Intel X86 processor family that started its life as a client processor but ended up taking over the server market. Many other examples include most power management innovations and new technologies such as die stacking that showed up first in client devices.
Understanding this dynamic, my prediction back in 2008 that ARM processors would end up powering important parts of the server market was an obvious one. If you agree that volume drives innovation in our business, it’s hard to argue with far more than 90B ARM parts shipped.
But, server-side success for ARM processors has been far from instant. Some very well-funded startups like Calxeda ended up running out of money. Some very large, competent and well-known companies have looked hard at the market, made significant investments, but ended up backing away for a variety of reasons, often completely unrelated to technical problems with what they were doing. AMD and Qualcomm are amongst the companies that have invested and then backed away, but the list is far longer. I saw the details behind some of this work and much of it was excellent. But new technology is hard. All companies, even very successful ones, need to focus their resources where they see the most value and often where they see short-term value.
I understand this, but it’s been difficult to watch so many projects fails. Some of these projects were massive investments and some of the work was very good. Nonetheless, as fast as projects were shut down, the opportunity remained obvious and, as a consequence, new investments were always being started. After nearly a decade, that’s still true. Many projects have started, almost the same number have been shut down, but the common element is that there are always many ARM Server investments underway.
In some ways it’s good that there continues to be deep investments in ARM server processors, but producing a winning part requires deep investment and patience. Much of the modern corporate world is only just “ok” at deep investments, and most are absolutely horrible at patience. Server processor development takes time, the ecosystem needs time to develop, and customers need time to adopt new technologies. Big changes never happen overnight and, without patience, they simply don’t happen at all.
Back in 2014 I was quoted as saying “the development of ARM-based chips for data center servers wasn’t progressing fast enough … to consider using them over Intel processors.” Like many quotes, it’s not exactly what I said but the gist was generally correct. In my opinion, at that time there were no ARM server parts under development that looked like they could win meaningful market segment share. All these investments were just slightly too incremental and a part that was only “about as good as what was currently in market”, isn’t going to attract much attention, isn’t going to cause the ecosystem to spring to action, and customers won’t go to the effort to port to it. Unless the new part is notable or remarkable in some dimension, it’s going to fail.
This was the backdrop to why I was almost giddy with excitement in the front row when Peter Desantis announced the AWS Graviton processor during his keynote at AWS re:Invent conference. Here’s what I posted at the time: AWS Designed Processor: Graviton. I was excited because what Peter announced was a good part with good specs that raised the price/performance bar for many workloads. But I was even more excited knowing that AWS has a roadmap for ARM processors, is patient, and specializes in moving quickly. The first Graviton part was good but, as I enjoyed the first Graviton announcement back in 2018, I knew what many speculated at that time: another part was underway.
The new part is Graviton2 and this is an exceptional server processor that will be a key part of the EC2 compute offering powering the M6g (general purpose), M6gd (general purpose with SSD block storage) the C6g (compute optimized), the R6g (memory optimized) and the R6gd (memory optimized with SSD block storage) instance families. This 7nm part is based upon customized 64-bit ARM Neoverse N1 cores and it is smoking fast. Rather than being offered as an alternative instance type that will run some workloads with better price/performance, it’s being offered as a better version of an existing, very highly-used EC2 instance type, the M5.
- >40% better integer performance on SPECint2017 rate (estimate)
- >20% better floating-point performance on SPECfp2017 Rate (estimate)
- >20% better web serving performance on NGINX
- >40% better performance on Memcached with lower latency and higher throughput
- >20% better media encoding performance for uncompressed 1080p to H.264 video
- 25% better BERT ML inference
- >50% better EDA performance on Cadence Xcelium EDA tool
This is a fast part and I believe there is a high probability we are now looking at what will become the first high volume ARM Server. More speeds and feeds:
- >30B transistors in 7nm process
- 64KB icache, 64KB dcache, and 1MB L2 cache
- 2TB/s internal, full-mesh fabric
- Each vCPU is a full non-shared core (not SMT)
- Dual SIMD pipelines/core including ML optimized int8 and fp16
- Fully cache coherent L1 cache
- 100% encrypted DRAM
- 8 DRAM channels at 3200 Mhz
The Anapurna team at AWS is doing amazing work. I wish I could show you all the work they currently have underway but only some of it is public. Even with multiple, difficult competing projects concurrently underway, they delivered Graviton2 on an unusually short schedule seldom seen in the semi-conductor world. It’s a great team to work with and Graviton2 is impressive work.
ARM Servers have been inevitable for a long time but it’s great to finally see them here and in customers hands in large numbers.