I have been interested in, and writing about, microservers since 2007. Microservers can be built using any instruction set architecture but I’m particularly interested in ARM processors and their application to server-side workloads. Today Advanced Micro Devices announced they are going to build an ARM CPU targeting the server market. This will be 4-core, 64 bit, more than 2Ghz part that is expected to sample in 2013 and ship in volume in early 2014.
AMD is far from new to microserver market. In fact, much of my past work on microservers has been AMD-powered. What’s different today is that AMD is applying their server processor skills while, at the same time, leveraging the massive ARM processor ecosystem. ARM processors power Apple iPhones, Samsung smartphones, tablets, disk drives, and applications you didn’t even know had computers in them.
The defining characteristic of server processor selection is to focus first and most on raw CPU performance and accept the high cost and high-power consumption that follows from that goal. The defining characteristic of Microservers is we leverage the high-volume client and connected device ecosystem and make a CPU selection on the basis of price/performance and power/performance with an emphasis on building balanced servers. The case for microservers is anchored upon these 4 observations:
· Volume economics: Rather than draw on the small-volume economics of the server market, with Microservers we leverage the massive volume economics of the smart device world driven by cell phones, tablets, and clients. To give some scale to this observation, IDC reports that there were 7.6M server units sold in 2010. ARM reports that there were 6.1B Arm processors shipped last year. The connected and embedded device market volumes are 1000x larger than that of the server market and the performance gap is shrinking rapidly. Semiconductor analyst Semicast estimates that by 2015 there will be 2 ARM processors for every person in the world. In 2010, ARM reported that, on average, there were 2.5 ARM-based processors in each Smartphone. The connected and embedded device market is 1000x that of that of the server world.
Having watched and participated in our industry for nearly 3 decades, one reality seems to dominate all others: high-volume economics drives innovation and just about always wins. As an example, IBM mainframes ran just about every important server-side workload in the mid-80s. But, they were largely swept aside by higher-volume RISC servers running UNIX. At the time I loved RISC systems – databases systems would just scream on them and they offered customers excellent price/performance. But, the same trend played out again. The higher-volume X86 processors from the client world swept the superior raw performing RISC systems aside.
Invariably what we see happening about once a decade is a high-volume, lower-priced technology takes over the low end of the market. When this happens many engineers correctly point out that these systems can’t hold a candle to the previous generation server technology and then incorrectly believe they won’t get replaced. The new generation is almost never better in absolute terms but they are better price/performers so they first are adopted for the less performance critical applications. Once this happens, the die is cast and the outcome is just about assured. The high-volume parts move up market and eventually take over even the most performance critical workloads of the previous generation. We see this same scenario play out roughly once a decade.
· Not CPU bound: Most discussion in our industry centers on the more demanding server workloads like databases but, in reality, many workloads are not pushing CPU limits and are instead storage, networking, or memory bound. There are two major classes of workloads that don’t need or can’t fully utilize more CPU:
1. Some workloads simply do not require the highest performing CPUs to achieve their SLAs. You can pay more and buy a higher performing processor but it will achieve little for these applications. Some workloads just don’t require more CPU performance to meet their goals.
2. This second class of workloads is characterized by being blocked on networking, storage, or memory. And by memory bound I don’t mean the memory is too small. In this case it isn’t the size of the memory that is the problem, but the bandwidth. The processor looks to be fully utilized from an operating system perspective but the bulk of its cycles are waiting for memory. Disk and CPU bound systems are easy to detect by looking for which is running close to 100% utilization while the CPU load is way lower. Memory bound is more challenging to detect but its super common so worth talking about it. Most server processors are super-scalar, which is to say they can retire multiple instructions each cycle. On many workloads, less than 1 instruction is retired each cycle (you can see this by monitoring Instructions per cycle) because the processor is waiting for memory transfers.
If a workload is bound on network, storage, or memory, spending more on a faster CPU will not deliver results. The same is true for non-demanding workloads. They too are not bound on CPU so a faster part won’t help in this case either.
· Price/performance: Device price/performance is far better than current generation server CPUs. Because there is less competition in server processors, prices are far higher and price/performance is relatively low compared to the device world. Using server parts, performance is excellent but price is not.
Let’s use an example again: A server CPU is hundreds of dollars sometimes approaching $1,000 whereas the ARM processor in an iPhone comes in at just under $15. My general rule of thumb in comparing ARM processors with server CPUs is they are capable of ¼ the processing rate at roughly 1/10th the cost. And, super important, the massive shipping volume of the ARM ecosystem feeds the innovation and completion and this performance gap shrinks the performance gap with each processor generation. Each generational improvement captures more possible server workloads while further improving price/performance
· Power/performance: Most modern servers run over 200W, and many are well over 500W, while microservers can weigh in at 10 to 20W. Nowhere is power/performance more important than in portable devices, so the pace of power/performance innovation in the ARM world is incredibly strong. In fact, I’ve long used mobile devices as a window into future innovations coming to the server market. The technologies you seen in the current generation of cell phones has a very high probability of being used in a future server CPU generation.
This is not the first ARM based server processor that has been announced. And, even more announcements are coming over the next year. In fact, that is one of the strengths of the ARM ecosystem. The R&D investments can be leveraged over huge shipping volume from many producers to bring more competition, lower costs, more choice, and a faster pace of innovation.
This is a good day for customers, a good day for the server ecosystem, and I’m excited to see AMD help drive the next phase in the evolution of the ARM Server market. The pace of innovation continues to accelerate industry-wide and it’s going to be an exciting rest of the decade.
Past notes on Microservers: