Intel Atom with ECC in 2012

Back in early 2008, I noticed an interesting phenomena: some workloads run more cost effectively on low-cost, low-power processors. The key observation behind this phenomena is that CPU bandwidth consumption is going up faster than memory bandwidth. Essentially, it’s a two part observation: 1) many workloads are memory bandwidth bound and will not run faster with a faster processor unless the faster processor comes with a much improved memory subsystem and 2) the number of memory bound workloads is going up overtime.

One solution is improve both the memory bandwidth and the processor speed and this really does work but it is expensive. Fast processors are expensive and faster memory subsystems bring more cost as well. The cost of the improved performance goes up non-linearly. N times faster costs far more than N times as much. There will always be workloads where these additional investments make sense. The canonical scale-up workload is relational database (RDBMS). These workloads tend to be tightly coupled and not scale out well. The non-linear cost of more expensive, faster gear is both affordable and justified by there really not being much of an alternative. If you can’t easily scale out and the workload is important, scale it up.

The poor scaling of RDBMS workloads is driving three approaches in our industry: 1) considerable spending on SANs, SSDs, flash memory, and scale-up servers, 2) the noSQL movement rebelling against the high-function, high-cost, and poor scaling characteristics of RDBMS, and 3) deep investments in massively parallel processing SQL solutions (Teradata, Greenplum, Vertica, Paracel, Aster Data, etc.).

All three approaches have utility and I’m interested in all three. However, there are workloads that really are highly parallel. For these workloads, the non-linear cost escalation of scale-up servers is less cost effective than more commodity servers. The banner workload in this class are simple web servers where there are potentially thousands of parallel requests and its more cost effective to turn these workloads over a fleet of commodity servers.

The entire industry has pretty much moved to deploying these highly parallelizable workloads over fleets of commodity servers. The will always be workloads that are tightly coupled and hard to run effectively over large number of commodity servers but, for those that can be, the gains are great: 1) far less expensive, 2) much smaller unit of failure, 3) cheap redundancy, 4) small, inexpensive unit of growth (avoid forklift upgrades), and 5) no hard limit at the scale-up limit.

This tells us two things: 1) where we can run a workloads over a large number of commodity servers we should, and 2) the number of workloads where parallel solutions have been found continues to increase. Even workloads like the social networking hairball problem (see hairball problem in http://perspectives.mvdirona.com/2010/02/15/ScalingAtMySpace.aspx) we now have economic parallel solutions. Even some RDBMS workloads can now be run cost effectively on scale-out clusters. The question I started thinking through about back in 2008 is how far can we take this? What if we used client processors or even embedded device processors? How far down the low-cost, low-power spectrum make sense for highly parallelizable workloads? Cleary the volume economics client and device processors make them appealing from a price perspective and the multi-decade focus on power efficiencies in the device world offers impressive power efficiencies for some workloads.

For more on the low-cost, low-power trend see:

· Very low-Cost, Low-Power Servers

· Microslice Servers

· The Case of Low-Cost, Low-Power Servers

Or the full paper on the approach: Cooperative Expendable, Microslice Servers

I’m particularly interested in the possible application of ARM processors to server workloads:

· Linux/Apache on ARM processors

· ARM Cortex-A9 SMP Design Announced

Intel is competing with ARM on some device workloads and offers the Atom which is not as power efficient as ARM but has the upside of running Intel Architecture instruction set. ARM also icensees recognize the trend I’ve outlined above – some server workloads will run very cost effectively on low-cost, low-power processors – and they want to compete in the server business. One excellent example is SeaMicro who is taking Intel Atom processors and competing for the server business: SeaMicro Releases Innovative Intel Atom Server.

Competition is a wonderful thing and drives much of the innovation the fruits of which we enjoy today. We have three interesting points of competition here: 1) Intel is competing for the device market with Atom, 2) ARM licensees are competing with Intel Xeon in the server market, and 3) Intel Atom is being used to compete in the server market but not with solid Intel backing.

Using Atom to compete with server processors is, on one hand, good for Intel in that it gives them a means of competing with ARM at the low end of the server market but, on the other hand, the Atom is much cheaper than Xeon so Intel will lose money on every workload it wins where that workloads used to be Xeon hosted. This risk is limited by Intel not giving Atom ECC memory (You Really do Need ECC Memory). Hobbling Atom is a dangerous tactic in that it protects Xeon but, at the same time, may allow ARM to gain ground in the server processor market. Intel could have ECC support on Atom in months – there is nothing technically hard about it. It’s the business implications that are more complex.

The first step in the right direction was made earlier this week where Intel announced an ECC capable Atom for 2012:

· Intel Plans on Bringing Atom to Servers in 2012, 20W SNB for Xeons in 2011

· Intel Flexes Muscles with New Processor in the Face of ARM Threat

This is potentially very good news for the server market but its only potentially good news. Waiting until 2012 suggests Intel wants to give the low power Xeons time in the market to see how they do before setting the price on the ECC equipped Atom. If the ECC supporting version of Atom is priced like a server part rather than a low-cost, high volume device part, then nothing changes. If the Atom with ECC comes in priced like a device processor, then it’s truly interesting. This announcement says that Intel is interested in the low-cost, low-power server market but still plans to delay entry into the lowest end of that market for another year. Still, this is progress and I’m glad to see it.

Thanks to Matt Corddry of Amazon for sending the links above my way.

–jrh

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

4 comments on “Intel Atom with ECC in 2012
  1. Thanks for the comment Deepgeek. You know my biases but your usage model really needs cloud computing even more the low-cost, low-power servers. Why not use AWS or a competing cloud service?

    –jrh

  2. Deep Geek says:

    I do wonder at what the adoption will be. I happen to have a real need for a webserver daemon that is geared more for static content than anything else. Because of this I would really welcome a low powered server (since I typically don’t do any database for my podcasting.)

    However, I am absolutely amazed that I can’t find any shared hosting geared specifically for static file serving on the net, let alone be able to convince fellow podcasters that it may be a better solution for at least their audio files.

    The web hosting industry looks to me like it is in a real "one size fits all" rut with the LAMP stack, with the exception of "if you want something different, we offer VPS so you can have it your way."

    Just my thoughts,

    DeepGeek

  3. Hi Greg. The distinction I’m drawing is between workloads that need single thread performance and those that can run well with large numbers of relatively slow cores not sharing a memory. Embarrassingly parallel without lots of cross thread state.

    –jrh

  4. Greg Linden says:

    I’m not sure the distinction between highly parallel workloads and not parallel workloads really captures the important distinction here. I’d think that the most important distinction would be real-time versus non-real-time (since real-time basically requires that the entire data set stay in memory across the cluster), perhaps followed by random access versus sequential access (which is the distinction between wanting to use memory and SSD versus hard disk), finally followed by highly parallel versus sequential (which says that one job can be run in parallel as well as running many separate jobs in parallel). But, yes, in all cases, the constraint almost always seem to be how fast we can move data around, not how fast we can process the data.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.