Sunday, June 13, 2010

I’ve been talking about the application low-power, low-cost processors to server workloads for years starting with The Case for Low-Cost, Low-Power Servers. Subsequent articles get into more detail: Microslice Servers, Low-Power Amdahl Blades for Data Intensive Computing, and Successfully Challenging the Server Tax

 

Single dimensional measures of servers like “performance” without regard to server cost or power dissipation are seriously flawed. The right way to measure server performance is work done per dollar and work done by joule. If you adopt these measures of workload performance, we find that cold storage workload and highly partitionable workloads run very well on low-cost, low-power servers. And we find the converse as well. Database workloads run poorly on these servers (see When Very Low-Power, Low-Cost Servers Don’t Make Sense).

 

The reasons why scale-up workloads in general and database workload specifically run poorly on low-cost, low-powered servers are fairly obvious.  Workloads that don’t scale-out, need bigger single servers to scale (duh). And workloads that are CPU bound tend to run more cost effectively on higher powered nodes. The later isn’t strictly true. Even with scale-out losses, many CPU bound workloads still run efficiently on low-cost, low-powered servers because what is lost on scaling is sometimes more than gained by lower-cost and lower power consumption.

 

I find the bounds where a technology ceases to work efficiently to be the most interesting area to study for two reasons: 1) these boundaries teach us why current solutions don’t cross the boundary and often gives us clues on how to make the technology apply more broadly, and most important, 2) you really need to know where not to apply a new technology. It is rare that a new technology is a uniform across-the board win. For example, many of the current applications of flash memory make very little economic sense. It’s a wonderful solution for hot I/O-bound workloads where it is far superior to spinning media. But flash is a poor fit for many of the applications where it ends up being applied. You need to know where not to use a technology.

 

Focusing on the bounds of why low-cost, low-power servers don’t run a far broader class of workloads also teaches us what needs to change to achieve broader applicability. For example, if we ask what if the processor cost and power dissipation was zero, we quickly see, when scaling down processors costs and power, it is what surrounds the processor that begins to dominate. We need to get enough work done on each node to pay for the cost and power of all the surrounding components from northbridge, through  memory, networking, power supply, etc. Each node needs to get enough done to pay for the overhead components.

 

This shows us an interesting future direction: what if servers shared the infrastructure and the “all except the processor” tax was spread over more servers? It turns out this really is a great approach and applying this principle opens up the number of workloads that can be hosted on low-cost, low-power servers. Two examples of this direction are the Dell Fortuna and Rackable CloudRack C2. Both these shared infrastructure servers take a big step in this direction.

 

SeaMicro Releases Innovative Intel Atom Server

One of the two server startups I’m currently most excited about is SeaMicro. Up until today, they have been in stealth mode and I haven’t been able to discuss what they are building. It’s been killing me. They are targeting the low-cost, low-power server market and they have carefully studied the lessons above and applied the learning deeply. SeaMicro has built a deeply integrated, shared infrastructure, low-cost, low-power server solution with a broader potential market than any I’ve seen so far. They are able to run the Intel x86 instruction set avoiding the adoption friction of using different ISAs and they have integrated a massive number servers very deeply into an incredibly dense package. I continue to point out that rack density for densities sake is a bug not a feature (see Why Blades aren’t the Answer to All Questions) but the SeaMicro server module density is “good density” that reduces cost and increases efficiency. At under 2kw for a 10RU module, it is neither inefficient or challenging from a cooling perspective.

 

Potential downsides of the SeaMicro approach is that the Intel Atom CPU is not quite as power efficient as some of the ARM-based solutions and it doesn’t currently support ECC memory. However, the SeaMicro design is available now and it is a considerable advancement over what is currently in the market. See You Really do Need ECC Memory in Servers for more detail on why ECC can be important. What SeaMicro has built is actually CPU independent and can integrate other CPUs as other choice become available and the current Intel Atom-based solution will work well for many server workloads. I really like what they have done.

 

SeaMicro have taken shared infrastructure to a entirely new level in building a 512 server module that takes just 10 RU and dissipates just under 2Kw. Four of these modules will fit in an industry standard rack, consume a reasonable 8kW, and deliver more work done joule, work done per dollar, and more work done per rack than the more standard approaches currently on the market.

The SeaMicro server module is comprised of:

·         512 1.6Ghz Intel Atoms (2048 CPUs/rack)

·         1 TB DRAM

·         1.28 Tbps networking fabric

·         Up to 16x 10Gbps ingress/egress network or up to 64 1Gbps if running 1GigE

·         0 to 64 SATA SSD or HDDs

·         Standard x86 instruction set architecture (no recompilation)

·         Integrated software load-balancer

·         Integrated layer 2 networking switch

·         Under 2kW power

 

The server nodes are built 8 to a board in one of the nicest designs I’ve seen for years:

 

 

The SeaMicro 10 RU chassis can be hosted 4 to an industry standard rack:

 

This is an important hardware advancement and it is great to see the faster pace of innovation sweeping the server world driven by innovative startups like SeaMicro.

 

                                                                                --jrh

 

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

 

Sunday, June 13, 2010 8:01:11 PM (Pacific Standard Time, UTC-08:00)  #    Comments [7] - Trackback
Hardware
Monday, June 14, 2010 10:51:45 AM (Pacific Standard Time, UTC-08:00)
James,
Very interesting.
BTW, I think they mistakenly interchanged the 100% CPU and 25% CPU column headers in the Power Savings table. Power use shown is higher for the 25% case than the 100% case.

--Naren
Naren Nayak
Monday, June 14, 2010 3:07:03 PM (Pacific Standard Time, UTC-08:00)
Naren,

The numbers are not switched. For the _same_ amount of work, lower CPU utilization means longer time to finish, hence more power to consume.

Thanks,
owen
owen
Monday, June 14, 2010 8:31:03 PM (Pacific Standard Time, UTC-08:00)
James - some questions about this analysis from SeaMicro.

First:
If you do the math on SeaMicro data @ 100%, 72KW/(10 racks * 2048 CPUs/Rack) = 3.5W an ATOM node, now ATOM is good but not 3.5W good for CPU+Memory+Chipset+ASIC. Memory alone is a lot more than 6W per 4GB. If you do the math on the PE610 it is 254KW/1000 systems = 254W which is about right but latest data on spec.org http://www.spec.org/power_ssj2008/results/res2009q2/power_ssj2008-20090506-00156.html has the PE610 with Westmere at 236W. In testing with ATOM with SSJ_OPS it is more like 22W a node.

Additionally: per http://www.spec.org/cpu2006/results/res2009q1/cpu2006-20090316-06788.html
PE610 SPECINT_RATE = 252, so to achive 100k SPECINT_rate per SeaMicro math would only be 396 nodes vs 1000. Not sure what they are really comparing considering I can find official public results. They need to be a bit more forth coming on what they are comparing.

Second:
PE610 @ 100% delivers 550,081 SSJ_OPs @ 236Ws per spec power report listed above.
1.6GHz DC ATOM @ 100% delivers ~15,000 SSJ_OPs @ 22W.
So if you fill a rack with 42 PE610 it delivers 23M SSJ_OPs vs 30M SSJ_OPs for 2048 DC ATOMs. Power for the 42 PE610's is 9,912W. Power for 2048 ATOMs is 45,056W
Perf/Watt for the Rack of PE610's is 2330 vs 674 for Rack of 2048 ATOMs.

Granted, this assume 100% load is achievable. Point is, analysis continues to show if your willing to virtualize and load up big cores it is far better than ATOMs. Now, for some applications in some instances ATOM absolutely makes sense - replacing a 5 yr old single application machine with a singular ATOM server makes sense from a power perspective and will deliver roughly the same performance, but if you replace 50 old machines with something like one PE610 the equation is different. Given the virtualizaiton model of moving VMs around to drive utilization up (and turn IDLE machines OFF) I fail to see an analysis that makes sense for ATOM. Power needs to come down and performance needs to go up. I'm sure Intel has done this analysis and is ensuring the ATOM equation does not out weigh a virtualized big core.

I am a fan of light weight servers for the right job, ex: IO bound or data movement problems can be served by light weight servers far better than an under utilized big core.

Net-net: I find the SeaMicro data questionable. But like the idea in some uses.

thanks

Tim
Tim
Tuesday, June 15, 2010 1:40:44 PM (Pacific Standard Time, UTC-08:00)
Tim,

1. One of the innovative aspects of SeaMicro's technology is that 90% of the components from a motherboard are eliminated with the ASIC. This enables us to focus on powering up only the key components needed for compute. Intel Atom is a processor family and there are multiple processors with varying levels of power consumption. The processor that is used in the SeaMicro system has a TDP of 2W and consumes 1.2W running 100% SpecIntRate workload. With regard to memory power consumption, other than a small amount of leakage power, power consumed by memory is directly proportional to how hard it is driven and so in an Atom server, memory consumes much lower than 6w - on the order of 1W. So, indeed the amortized power consumption of an Atom server under average workload in a SeaMicro infrastructure is about 3.5W.

2. Spec benchmarks should not be compared without looking at the test scenario and conditions. SeaMicro measured all the performance numbers shown in the above slide using gcc compiler and not ICC compiler. If you do a search on Spec.org web site for PE610 performance numbers with gcc compiler you will see the results to be in-line with what is shown in the slides.

3. It is well known in the industry that for Scale-out workloads, low power CPUs provide much better performance per watt. James has written many articles about that and so have other visionaries in the industry. This is an area where a system built out of a parallel array of Atom make a lot of sense and is the market that is perfectly applicable to SeaMicro technology.

thanks, anil
Thursday, June 17, 2010 10:04:19 AM (Pacific Standard Time, UTC-08:00)
Owen,
I understand it's comparing a 100K SPECint_rate target for both platforms at different loads, but if it takes time into account, shouldn't it be KWh?

Naren
Friday, June 18, 2010 5:12:57 AM (Pacific Standard Time, UTC-08:00)
Comparing a workload rate (work unit per unit time) with an instantaneous power consumption number is pretty common. If you assume that x units of work per second (UPS) is a constant rate and not going up and down and the power draw is relatively constant during the test which is the normal case, than looking at work rate at X W is fine.

When looking the work completion RATE at a constant power consumption, watt is the right measure. When looking at the cost to get X units of work done (not a rate), then I agree, you should use joule (watt second) or KWh as the denominator.

--jrh
jrh@mvdirona.com
Monday, June 21, 2010 2:15:11 PM (Pacific Standard Time, UTC-08:00)
1T of ECC memory in a low power server. Sounds quite interesting for in-memory distributed databases. Even better would be if they could increase the memory per core up from 2G.

As we talked about before (at http://perspectives.mvdirona.com/2010/05/18/WhenVeryLowPowerLowCostServersDontMakeSense.aspx), it might be a sweet spot for some big applications to have as much memory as possible in a low power server. Some big web apps want to keep most or all of their data rapidly available in memory but are less sensitive to CPU performance.
Comments are closed.

Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.

Archive
<June 2010>
SunMonTueWedThuFriSat
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910

Categories
This Blog
Member Login
All Content © 2014, James Hamilton
Theme created by Christoph De Baene / Modified 2007.10.28 by James Hamilton