Monday, November 30, 2009

Very low-power scale-out servers -- it’s an idea whose time has come. A few weeks ago Intel announced it was doing Microslice servers: Intel Seeks new ‘microserver’ standard. Rackable Systems (I may never manage to start calling them ‘SGI’ – remember the old MIPS-based workstation company?) was on this idea even earlier: Microslice Servers. The Dell Data Center Solutions team has been on a similar path: Server under 30W.

 

Rackable has been talking about very low power servers as physicalization: When less is more: the basics of physicalization. Essentially they are arguing that rather than buying, more-expensive scale-up servers and then virtualizing the workload onto those fewer servers, buy many smaller servers. This saves the virtualization tax which can run 15% to 50% in I/O intensive applications and smaller and low-scale servers can produce more work done per joule and better work done per dollar. I’ve been a believer in this approach for years and wrote it up for the Conference on Innovative Data Research last year in The Case for Low-Cost, Low-Power Servers.

 

I’ve recently been very interested in the application of ARM processors to web-server workloads:

·         Linux/Apache on ARM Processors

·         ARM Cortex-A9 SMP Design Announced

 

ARMs are an even more radical application of the Microslice approach.

 

Scale-down servers easily win on many workloads when looking at work done per dollar and work done per joule and I claim, if you are looking at single dimensional metrics, like performance, you aren’t looking hard enough. However, there are workloads where scale-up wins. They are absolutely required when the workload won’t partition and scale near linearly. Database workloads are classic examples of partition-resistant workloads that really do often run better on more-expensive, scale-up servers.

 

The other limit is administration. Non-automated IT shops believe they are better off with fewer, more-expensive servers although they often achieve this goal by running many operating system images on a single server.  Given that the bulk of administration is spent on the software stack, it’s not clear that this approach of running the same number of O/S images and software stacks on a single server is a substantial savings. However, I do agree that administration costs are important at low-scale. If, at high-scale, admin costs are over 10% of overall operational costs, go fix it rather than buying bigger, more expensive servers.

 

When do scale-up servers win economically? 1) very low-scale workloads where administration costs dominate, and 2) workloads that partition poorly and suffer highly-sub-linear scale-out.  Simple web workloads and other partition-tolerant applications should look to scale-down severs. Make sure your admin costs are sub-10% and don’t scale with server count. Then use work done per dollar and work done per joule and you’ll be amazed to see scale-down gets more done at lower cost and lower power consumption.

 

2010 is the year of the low-cost, scale-down server.

 

                                                --jrh

 

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

 

Monday, November 30, 2009 7:04:17 AM (Pacific Standard Time, UTC-08:00)  #    Comments [10] - Trackback
Hardware
Monday, November 30, 2009 10:12:29 AM (Pacific Standard Time, UTC-08:00)
Hey, James, first link is broken on the application of ARM processors to web-server workloads. Should be

http://perspectives.mvdirona.com/2009/09/07/LinuxApacheOnARMProcessors.aspx
Monday, November 30, 2009 2:46:50 PM (Pacific Standard Time, UTC-08:00)
Clustered, non-relational data stores like HBase or Cassandra could be a another good for scale-down. One of their traits is easy and automatic partitioning.

One thing that has slowed adoption of these techniques with virtual server has been slow or inconsistent IO performance. IO isolation could give you more predictable performance.

On-demand provisioning seems like a great complement to these types of databases since they make growth or shrinkage more manageable.

This is all on top of the improved energy/cost efficiencies.
Mike Adler
Monday, November 30, 2009 2:59:54 PM (Pacific Standard Time, UTC-08:00)
Thanks for the correction Greg and it was good catching up with you at SJC last week. You should drop by for a after work beer sometime in near future.

-jrh
jrh@mvdirona.com
Monday, November 30, 2009 3:49:57 PM (Pacific Standard Time, UTC-08:00)
Thanks for the comment Mike. I agree that some database management systems are easy to scale while others are very tough. Scaling full relational systems is an example of the "hard" side of that spectrum. For a deeper dive on this topic see: http://perspectives.mvdirona.com/2009/11/03/OneSizeDoesNotFitAll.aspx

The VM world will improve rapidly but, right now, I/O intensive workloads are a problem for two reasons: 1) predictability (your point above) and 2) overhead (VMs add a layer of overhead to all I/O calls). Both of these issues will improve with time and the VM tax will go down greatly. As the VM costs fades away, there are more and more reasons to just virtualize all workloads. Now, which is the most cost effective model to run these workloads: 1) many on a scale-up server or 2) a few on a low-cost server. Unless the workload is inherently difficult to scale (and many are) cheap, scale-out servers will often win on work done per dollar and work done per joule.

Essentially I agree with your observations. The challenge for us is to 1) find more workloads that can be partitioned and more tricks to partition workloads (depending upon scalable data stores is one good technique) and 2) where we have a paritionable workload, to run it on the most cost effective servers. I'm arguing these will often by low-cost low power servers.

Another point that I didn't make in the original blog entry but did in the paper paper that it points to is the following progression: 1) many workloads are memory bound today, and 2) memory bandwidth is increasing slower than CPU bandwidth making more and more workloads memory bound over time. There are two approaches to this problem: 1) scaling up the memory subsystem and 2) using more servers since the aggregate memory bandwidth of a large number of low-scale servers is very high. We'll do some of both and Intel Nahalem is a big step forward in memory bandwidth. But, in the end, more is better and scale-out will always win on any workload that can be distributed.

--jrh
jrh@mvdirona.com
Monday, November 30, 2009 5:43:34 PM (Pacific Standard Time, UTC-08:00)
I've experienced firsthand the joy of data center power density limits (very easy to hit with a rack packed full of high end servers, or blades), and the cost escalation that comes along with it... my contention is that the pain of architecting systems to scale well across many small servers, is worthwhile for most any system at a sufficiently large scale.

There is also a software license cost component to keep in mind. OS / Platform vendors might consider finding a way to price their wares in a way that cooperates with, rather than penalizes, the many-small-server approach.

I wrote a bit more about this on my blog also.
Monday, November 30, 2009 9:21:52 PM (Pacific Standard Time, UTC-08:00)
For years I've been looking for cheap, low-power, ARM-based servers to run Tahoe-LAFS as the backend for http://allmydata.com . Still haven't found some for sale. Here are some of my notes:

http://testgrid.allmydata.org:3567/uri/URI:DIR2-RO:j74uhg25nwdpjpacl6rkat2yhm:kav7ijeft5h7r7rxdp5bgtlt3viv32yabqajkrdykozia5544jqa/wiki.html#2009-05-10%202009-05-09%202009-08-30%202009-09-01
Tuesday, December 01, 2009 5:55:53 PM (Pacific Standard Time, UTC-08:00)
Thanks for the link Zooko but the link above timed out. Perhaps the link is incorrect?
Tuesday, December 01, 2009 6:03:07 PM (Pacific Standard Time, UTC-08:00)
Thanks for the comment Kyle. As you may know I've not been a big fan of the current generation blade servers. I'm fine with blade configurations but the current price points of the major blade producers don't work at scale: http://perspectives.mvdirona.com/2008/09/11/WhyBladeServersArentTheAnswerToAllQuestions.aspx.

ARM based servers are being considered out there and I'm pretty sure we'll see at least one delivered in late 2010 or very early 2011.

Your right that software costs can really screw-up the equation. Most of the work I do is with custom, internally written software and open source. But, if you need a fixed for-fee software package, licensing can drive silly behavior. I've seen customer have to turn off cores due to per-core software licensing. It breaks my heart to see hardware purchased, powered and then not used but, I agree, it really does happen.

--jrh
jrh@mvdirona.com
Wednesday, December 02, 2009 8:41:31 PM (Pacific Standard Time, UTC-08:00)
James: that link works for me. It is hosted on my open source secure cloud storage system, so there could be a bug or operational problem that makes it time out, but I haven't seen this myself. Could you please try again? My notes (on that page that you'll get if that link works) include an observation that there is a factor-of-2 error in the FAWN paper.

Regards,

Zooko
Thursday, December 03, 2009 6:11:48 AM (Pacific Standard Time, UTC-08:00)
You are right Zooko, the URL is fine. The problem is you are using port 3567 and many corporate networks don't allow traffic on arbitrary network ports.

Fundamentally, what the FAWN is doing is changing the ratio of CPU to memory bandwidth. To get a balanced system we either need more memory bandwidth or less CPU. Less CPU is a cheaper approach.

In your write-up, you compare the cost of a Buffalo Linkstation and FAWN. My observation on storage costs is we again have a ratio that is interesting (assuming the storage server only serves storage and doesn't host application code). In this case, the interesting ratio is server cost to the amount storage it hosts. We need to amortize the cost of the server over the cost of the storage that it is hosting. We can get the cost of the combination of server and disk asymptotically close to the cost of disk by either: 1) attaching more disk to the storage server or 2) using a less expensive storage server. Both FAWN and the Buffalo Linkstation choose the second approach: use a very low-cost server to host the storage. In the case of Buffalo, they are using a simple embedded processor whereas FAWN uses a very low-powered general purpose server. But the general approach in both is the same: less cost in the server that is hosting the disk.

Thanks for reposting the link for me Zooko.

--jrh
jrh@mvdirona.com
Comments are closed.

Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.

Archive
<April 2014>
SunMonTueWedThuFriSat
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910

Categories
This Blog
Member Login
All Content © 2014, James Hamilton
Theme created by Christoph De Baene / Modified 2007.10.28 by James Hamilton