Back in early 2008, I noticed an interesting phenomena: some workloads run more cost effectively on low-cost, low-power processors. The key observation behind this phenomena is that CPU bandwidth consumption is going up faster than memory bandwidth. Essentially, it’s a two part observation: 1) many workloads are memory bandwidth bound and will not run faster with a faster processor unless the faster processor comes with a much improved memory subsystem and 2) the number of memory bound workloads is going up overtime.
One solution is improve both the memory bandwidth and the processor speed and this really does work but it is expensive. Fast processors are expensive and faster memory subsystems bring more cost as well. The cost of the improved performance goes up non-linearly. N times faster costs far more than N times as much. There will always be workloads where these additional investments make sense. The canonical scale-up workload is relational database (RDBMS). These workloads tend to be tightly coupled and not scale out well. The non-linear cost of more expensive, faster gear is both affordable and justified by there really not being much of an alternative. If you can’t easily scale out and the workload is important, scale it up.
The poor scaling of RDBMS workloads is driving three approaches in our industry: 1) considerable spending on SANs, SSDs, flash memory, and scale-up servers, 2) the noSQL movement rebelling against the high-function, high-cost, and poor scaling characteristics of RDBMS, and 3) deep investments in massively parallel processing SQL solutions (Teradata, Greenplum, Vertica, Paracel, Aster Data, etc.).
All three approaches have utility and I’m interested in all three. However, there are workloads that really are highly parallel. For these workloads, the non-linear cost escalation of scale-up servers is less cost effective than more commodity servers. The banner workload in this class are simple web servers where there are potentially thousands of parallel requests and its more cost effective to turn these workloads over a fleet of commodity servers.
The entire industry has pretty much moved to deploying these highly parallelizable workloads over fleets of commodity servers. The will always be workloads that are tightly coupled and hard to run effectively over large number of commodity servers but, for those that can be, the gains are great: 1) far less expensive, 2) much smaller unit of failure, 3) cheap redundancy, 4) small, inexpensive unit of growth (avoid forklift upgrades), and 5) no hard limit at the scale-up limit.
This tells us two things: 1) where we can run a workloads over a large number of commodity servers we should, and 2) the number of workloads where parallel solutions have been found continues to increase. Even workloads like the social networking hairball problem (see hairball problem in http://perspectives.mvdirona.com/2010/02/15/ScalingAtMySpace.aspx) we now have economic parallel solutions. Even some RDBMS workloads can now be run cost effectively on scale-out clusters. The question I started thinking through about back in 2008 is how far can we take this? What if we used client processors or even embedded device processors? How far down the low-cost, low-power spectrum make sense for highly parallelizable workloads? Cleary the volume economics client and device processors make them appealing from a price perspective and the multi-decade focus on power efficiencies in the device world offers impressive power efficiencies for some workloads.
For more on the low-cost, low-power trend see:
Or the full paper on the approach: Cooperative Expendable, Microslice Servers
I’m particularly interested in the possible application of ARM processors to server workloads:
· Linux/Apache on ARM processors
· ARM Cortex-A9 SMP Design Announced
Intel is competing with ARM on some device workloads and offers the Atom which is not as power efficient as ARM but has the upside of running Intel Architecture instruction set. ARM also icensees recognize the trend I’ve outlined above – some server workloads will run very cost effectively on low-cost, low-power processors – and they want to compete in the server business. One excellent example is SeaMicro who is taking Intel Atom processors and competing for the server business: SeaMicro Releases Innovative Intel Atom Server.
Competition is a wonderful thing and drives much of the innovation the fruits of which we enjoy today. We have three interesting points of competition here: 1) Intel is competing for the device market with Atom, 2) ARM licensees are competing with Intel Xeon in the server market, and 3) Intel Atom is being used to compete in the server market but not with solid Intel backing.
Using Atom to compete with server processors is, on one hand, good for Intel in that it gives them a means of competing with ARM at the low end of the server market but, on the other hand, the Atom is much cheaper than Xeon so Intel will lose money on every workload it wins where that workloads used to be Xeon hosted. This risk is limited by Intel not giving Atom ECC memory (You Really do Need ECC Memory). Hobbling Atom is a dangerous tactic in that it protects Xeon but, at the same time, may allow ARM to gain ground in the server processor market. Intel could have ECC support on Atom in months – there is nothing technically hard about it. It’s the business implications that are more complex.
The first step in the right direction was made earlier this week where Intel announced an ECC capable Atom for 2012:
This is potentially very good news for the server market but its only potentially good news. Waiting until 2012 suggests Intel wants to give the low power Xeons time in the market to see how they do before setting the price on the ECC equipped Atom. If the ECC supporting version of Atom is priced like a server part rather than a low-cost, high volume device part, then nothing changes. If the Atom with ECC comes in priced like a device processor, then it’s truly interesting. This announcement says that Intel is interested in the low-cost, low-power server market but still plans to delay entry into the lowest end of that market for another year. Still, this is progress and I’m glad to see it.
Thanks to Matt Corddry of Amazon for sending the links above my way.