ARM Cortex-A9 SMP Design Announced

ARM just announced a couple of 2-core SMP design based upon the Cortex-A9 application processor, one optimized for performance and the other for power consumption (http://www.arm.com/news/25922.html). Although the optimization points are different, both are incredibly low power consumers by server standards with the performance-optimized part dissipating only 1.9W at 2Ghz based upon the TSMC 40G process (40nm). This design is aimed at server applications and should be able to run many server workloads comfortably.

In Linux/Apache on ARM Processors I described an 8 server cluster of web servers running the Marvell MV78100. These are single core ARM design servers produced by Marvell. It’s a great demonstration system showing that web server workloads can be run cost effectively on ARM based servers. Toward the end of the blog entry, I observed:

The ARM is a clear win on work done per dollar and work done per joule for some workloads. If a 4-core, cache coherent version was available with a reasonable memory controller, we would have a very nice server processor with record breaking power consumption numbers.

I got a call from ARM soon after posting saying that I may get my wish sooner than I was guessing. Very cool. The Design that was announced earlier today includes a 2-core, performance optimized design that could form the building block of a very nice server. In the following block diagram, ARM shows a pair of 2-core macros implementing a 4-way SMP:

Some earlier multi-core ARM designs such the Marvel MV78200 are not cache coherent which makes it difficult to support a single application utilizing both cores. As long as this design is coherent (and I believe it is), I love it.

Technically it’s long been possible to build N-way SMP servers based upon the single core Cortex-A9 macros but it’s quite a bit of design work. The 2-way single macro makes it easy to deliver at least 2-core servers and this announcement shows that ARM is interested in and is investing in developing the ARM-based server market.

The ARM reported performance results:

In the ARM business model, the release of a design is the first and most important step towards parts becoming available from partners. However, it’s typically at least 12 months from design availability to first shipping silicone from partners so we won’t likely see components based upon this design until late 2010 at the earliest. I’m looking forward to it.

Our industry just keeps getting more interesting.

–jrh

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

6 comments on “ARM Cortex-A9 SMP Design Announced
  1. Kiss asked "And more important, is the return(potential system performance improvement, reduced power/cooling cost in data center and smaller electricity bill) attractive for system vendors? Or, is this market big enough to support a new silicon design (45nm or others)? make this ultra-low-power chip be also low-cost?"

    Yes, absolutely. The server market is small relative to the client market and tiny compared to the mobile phone market so the design I expect will piggy back on innovation and volume in client and mobile devices, but I do expect we’ll see a Arm based server design aimed at the high-scale data center market over the next 12 to 16 months.

    James Hamilton
    jrh@mvdirona.com

  2. KISS says:

    Hi, James
    As a chip developer, I am thinking what if developing an ultra-low-power server chip using ARM core. Just like styles of these machines – using "embedded cores" do big things:
    BlueGene (powerpc core)
    SiCortex (mips core)
    SGI’s Molecule (atom) (http://www.theregister.co.uk/2008/11/20/sgi_molecule_concept/)
    CMU’s FAWN machine(http://www.cs.cmu.edu/~fawnproj/)
    And Microsoft research’s Atom machine(http://www.theregister.co.uk/2009/02/25/microsoft_sleepy_servers/)

    But what’s the obstacle to develop such machine around ARM:
    1, lack of 64-bit is a problem?
    2, at software-side – does ARM ISA/Toolchain have good support for LAMP staffs? what about the effort to porting software?

    And more important, is the return(potential system performance improvement, reduced power/cooling cost in data center and smaller electricity bill) attractive for system vendors? Or, is this market big enough to support a new silicon design (45nm or others)? make this ultra-low-power chip be also low-cost?

    These question is post at RWT also
    http://www.realworldtech.com/forums/index.cfm?action=detail&id=102712&threadid=102672&roomid=2

  3. Keean, you asked the size of an ARM core. The performance version of the core macro just announced is 6.7mm2.

    Big Kate, you were speculating that these parts were destined to compete in the netbook market. Yes, certainly that target will be more important than the server market. You can’t argue with client volumes. And that market is ready and willing to accept a non-X86 ISA today. The server market is as yet unproven but ARM has enough vision to go after it and they are.

    Maht, good catch on the spelling of silicon — i wish that was the first time I had done that one :-).I’ll fix it. Thanks,

    James Hamilton
    jrh@mvdirona.com

  4. maht says:

    great blog btw, v interesting

    But I doubt they will be using silicone

    p.s. your email input box is too short, I wanted to use my address scheme which would have been

    maht-perspectives.mvdirona.com@mail.maht0x0r.net

  5. big kate says:

    other people think this arm going after netbooks
    http://www.pcpro.co.uk/news/351619/arm-launches-attack-on-intels-netbook-stranglehold
    "the processor’s physical size has helped reduce power consumption. "If you just look at an [Intel] Atom by itself, our processor is a third of the size, so the amount of silicon it consumes is significantly less and that reduces cost,"

    according to wikipedia the atom is 25 mm2, whilst a xeon according to intel has a Die Size of 263 mm2

    so if we assume the arm core is about 8 mm2 that puts about 30 in the same space as xeon die for die

    of course that rubbish in reality but it gives us some idea of the scale

  6. Keean says:

    What is the area of one of these ARM cores? How many would fit on a die at the current 40nm process technology (considering a die with practical yield, perhaps the size of a modern GPU or Xeon processor). What would be the best compromise between number of cores and cache size on such a chip?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.