Thursday, May 28, 2009

I’ve brought together links to select past postings and posted them to: It’s linked to the blog front page off the “about” link. I’ll add to this list over time. If there is a Perspectives article not included that you think should be, add a comment or send me email.


Talks and Presentations

Data Center Architecture and Efficiency

Service Architectures


Server Hardware

High-Scale Service Optimizations, Techniques, & Random Observations

James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Thursday, May 28, 2009 4:45:07 AM (Pacific Standard Time, UTC-08:00)  #    Comments [5] - Trackback
 Friday, May 22, 2009

Two years ago I met with the leaders of the newly formed Dell Data Center Solutions team and they explained they were going to invest deeply in R&D to meet the needs of very high scale data center solutions.  Essentially Dell was going to invest in R&D for a fairly narrow market segment. “Yeah, right” was my first thought but I’ve been increasingly impressed since then. Dell is doing very good work and the announcement of Fortuna this week is worthy of mention.


Fortuna, the Dell XS11-VX8, is an innovative server design. I actually like the name as proof that the DCS team is an engineering group rather than a marketing team. What marketing team would chose XS11-VX8 as a name unless they just didn’t like the product? 


The name aside, this server is excellent work. It is based on the Via Nano and the entire server is just over 15W idle and just under 30W at full load. It’s a real server with 1GigE ports, full remote management via IPMI 2.0 (stick with the DCMI subset). In a fully configured rack, they can house 252 servers only requiring 7.3KW. Nice work DCS!



6 min video with more data:




James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:

Friday, May 22, 2009 10:15:01 AM (Pacific Standard Time, UTC-08:00)  #    Comments [7] - Trackback
 Thursday, May 21, 2009

Cloud services provide excellent value but it’s easy to underestimate the challenge of getting large quantities of data to the cloud. When moving very large quantities of data, even the fastest networks are surprisingly slow.  And, many companies have incredibly slow internet connections. Back in 1996 MInix author and networking expert, Andrew Tanenbaum said “Never underestimate the bandwidth of a station wagon  full of tapes hurtling down the highway”.  For large data transfers, it’s faster (and often cheaper) to write to local media and ship the media via courier.


This morning the Beta release Amazon Web Services Import/Export was announced. This service essentially implements sneakernet allowing the efficient transfer of very large quantities of data into or out of the AWS Simple Storage Service. This initial beta release only supports import but the announcement reports that “the service will be expanded to include export in the coming months”.


To use the service, the data is copied to a portable storage device formatted using NTFS, FAT, ext2, or ext3 file systems. The manifest that describes the data load job is digitally signed using the sending users AWS access secret key and shipped to Amazon for loading.  Load charges are:

Device Handling

·         $80.00 per storage device handled.

Data Loading Time

·         $2.49 per data-loading-hour. Partial data-loading-hours are billed as full hours.

Amazon S3 Charges

·         Standard Amazon S3 Request and Storage pricing applies.

·         Data transferred between AWS Import/Export and Amazon S3 is free of charge (i.e. $0.00 per GB).

In addition to allowing much faster data ingestion, AWS Import/Export reduces networking costs since there is no charge for the transfer of data from the Import/Export service and S3.  A calculator is provided to compare estimated electronic transfer costs vs import/export costs.  It’s a clear win for larger data sets.




James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Thursday, May 21, 2009 5:49:27 AM (Pacific Standard Time, UTC-08:00)  #    Comments [4] - Trackback
 Wednesday, May 20, 2009

From an interesting article in Data Center Knowledge Who has the Most Web Servers:

The article continues to speculate on server counts at the companies that publically disclose server counts but are likely over 50k.  Google is likely around a million, microsoft is over 200k, and "Amazon says very little about its data center operations".




James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Wednesday, May 20, 2009 4:45:41 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2] - Trackback
 Tuesday, May 19, 2009


Our 1999 Mitsubishi 3000 VR4 For Sale. Black-on-black with 80,000 miles. $12,500 OBO. Fewer than 300 1999 VR-4s were produced for North America, and only 101 in black-on-black.


We love this car and hate to sell it, but are living downtown Seattle and no longer need a car. It's a beautiful machine, 320 HP, and handles incredibly well. We're often stopped on the street asking if we would sell it, and now we are.


Details and pictures at:


Our house in Bellevue is for sale as well: 4509 Somerset Pl SE, Bellevue, Wa. Virtual tour:




James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Tuesday, May 19, 2009 5:25:29 AM (Pacific Standard Time, UTC-08:00)  #    Comments [6] - Trackback
 Monday, May 18, 2009

Earlier this morning Amazon Web Services announced the public beta of Amazon Cloudwatch, Auto Scaling, and Elastic Load Balancing.  Amazon Cloudwatch is a web service for monitoring AWS resources. Auto Scaling automatically grows and shrinks Elastic Compute Cloud resources based upon demand.  Elastic Load Balancing distributed workload over a fleet of EC2 servers.

  • Amazon CloudWatch – Amazon CloudWatch is a web service that provides monitoring for AWS cloud resources, starting with Amazon EC2. It provides you with visibility into resource utilization, operational performance, and overall demand patterns—including metrics such as CPU utilization, disk reads and writes, and network traffic. To use Amazon CloudWatch, simply select the Amazon EC2 instances that you’d like to monitor; within minutes, Amazon CloudWatch will begin aggregating and storing monitoring data that can be accessed using web service APIs or Command Line Tools. See Amazon CloudWatch for more details.
  • Auto Scaling – Auto Scaling allows you to automatically scale your Amazon EC2 capacity up or down according to conditions you define. With Auto Scaling, you can ensure that the number of Amazon EC2 instances you’re using scales up seamlessly during demand spikes to maintain performance, and scales down automatically during demand lulls to minimize costs. Auto Scaling is particularly well suited for applications that experience hourly, daily, or weekly variability in usage. Auto Scaling is enabled by Amazon CloudWatch and available at no additional charge beyond Amazon CloudWatch fees. See Auto Scaling for more details.
  • Elastic Load Balancing – Elastic Load Balancing automatically distributes incoming application traffic across multiple Amazon EC2 instances. It enables you to achieve even greater fault tolerance in your applications, seamlessly providing the amount of load balancing capacity needed in response to incoming application traffic. Elastic Load Balancing detects unhealthy instances within a pool and automatically reroutes traffic to healthy instances until the unhealthy instances have been restored. You can enable Elastic Load Balancing within a single Availability Zone or across multiple zones for even more consistent application performance. Amazon CloudWatch can be used to capture a specific Elastic Load Balancer’s operational metrics, such as request count and request latency, at no additional cost beyond Elastic Load Balancing fees. See Elastic Load Balancing for more details.

James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Monday, May 18, 2009 5:16:10 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
 Saturday, May 16, 2009

A couple of weeks back, a mini-book by Luiz André Barroso and Urs Hölzle of the Google infrastructure team was released. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines is just over 100 pages long but an excellent introduction into very high scale computing and the issues important at scale.


From the Abstract:

As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today’s WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today’s WSCs on a single board.


Some of the points I found particularly interesting:

·         Networking:

o   Commodity switches in each rack provide a fraction of their bi-section bandwidth for interrack communication through a handful of uplinks to the more costly cluster-level switches. For example, a rack with 40 servers, each with a 1-Gbps port, might have between four and eight 1-Gbps uplinks to the cluster-level switch, corresponding to an oversubscription factor between 5 and 10 for communication across racks. In such a network, programmers must be aware of the relatively scarce cluster-level bandwidth resources and try to exploit rack-level networking locality, complicating software development and possibly impacting resource utilization. Alternatively, one can remove some of the cluster-level networking bottlenecks by spending more money on the interconnect fabric.

·         Server Power Usage:

·         Buy vs Build:

Traditional IT infrastructure makes heavy use of third-party software components such as databases and system management software, and concentrates on creating software that is specific to the particular business where it adds direct value to the product offering, for example, as business logic on top of application servers and database engines. Large-scale Internet services providers such as Google usually take a different approach in which both application-specific logic and much of the cluster-level infrastructure software is written in-house. Platform-level software does make use of third-party components, but these tend to be open-source code that can be modified inhouse as needed. As a result, more of the entire software stack is under the control of the service developer.


This approach adds significant software development and maintenance work but can provide important benefits in flexibility and cost efficiency. Flexibility is important when critical functionality or performance bugs must be addressed, allowing a quick turn-around time for bug fixes at all levels. It is also extremely advantageous when facing complex system problems because it provides several options for addressing them. For example, an unwanted networking behavior might be very difficult to address at the application level but relatively simple to solve at the RPC library level, or the other way around.


The full paper:


James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Saturday, May 16, 2009 9:30:04 AM (Pacific Standard Time, UTC-08:00)  #    Comments [4] - Trackback
 Tuesday, May 05, 2009

High data center temperatures is the next frontier for server competition (see pages 16 through 22 of my Data Center Efficiency Best Practices talk: and 32C (90F) in the Data Center). At higher temperatures the difference between good and sloppy mechanical designs are much more pronounced and need to be a purchasing criteria.


The infrastructure efficiency gains of running at higher temperatures are obvious. In a typical data center 1/3 of the power arriving at the property line is consumed by cooling systems.  Large operational expenses can be avoided by raising the temperature set point.  In most climates raising data center set points to the 95F range will allow a facility to move to a pure air-side economizer configuration eliminating 10% to 15% of the overall capital expense with the later number being the more typical.


These savings are substantial and exciting.  But, there are potential downsides: 1) increased server mortality, 2) higher semi-conductor leakage current at higher temperatures, 3) increased air movement costs driven by higher fan speeds at higher temperatures.  The former, increased server mortality, has very little data behind it. I’ve seen some studies that confirm higher failure rates at higher temperature and I’ve seen some that actually show the opposite.  For all servers there clearly is some maximum temperature beyond which failure rates will increase rapidly. What’s unclear is what that temperature point actually is.


We also know that the knee of the curve where failures start to get more common is heavily influenced by the server components chosen and the mechanical design.  Designs that cool more effectively, will operate without negative impact at higher temperatures. We could try to understand all details of each server and try to build a failure prediction model for different temperatures but this task is complicated by the diversity of servers and components and the near complete lack of data at higher temperatures. 


So, not being able to build a model, I chose to lean on a different technique that I’ve come to prefer: incent the server OEMs to produce the models themselves. If we ask the server OEMs to warrant the equipment at the planned operating temperature, we’re giving the modeling problem to the folks that have both the knowledge and the skills to model the problem faithfully and, much more importantly, they have ability to change designs if they aren’t fairing well in the field. The technique of transferring the problem to the party most capable of solving it and financially incenting them to solve it will bring success. 


My belief is that this approach of transferring the risk, failure modeling, and field result tracking to the server vendor will control point 1 above (increased server mortality rate). We also know that the Telecom world has been operating at 40C (104F) for years (see NEBS)so clearly equipment can be designed to operate correctly at these temperatures and last longer than current servers are used. This issue looks manageable.


The second issue raised above was increased semi-conductor current leakage at higher temperatures.  This principle is well understood and certainly measureable. However, in the crude measurements I’ve seen, the increased leakage is lost in the noise of higher fan power losses. And, the semiconductor leakages costs are dependent upon semi-conductor temperature rather than air inlet temperature. Better cooling designs or higher air volumes can help prevent substantial increases in actually semi-conductor temperatures. Early measurements with current servers suggests that this issue is minor so I’ll set it aside as well.


The final issue issues is hugely important and certainly not lost in the noise. As server temperatures go up, the required cooling air flow will increase.  Moving more air consumes more power and, as it turns out, air is an incredibly inefficient fluid to move.  More fan speed is a substantial and very noticeable cost.  What this tells us is the savings of higher temperature will get eaten up, slowly at first and more quickly as the temperature increases, until some cross over point where fan power increases dominate conventional cooling system operational costs.


Where is the knee of the curve where increased fan power crosses over and dominates the operational savings of running at higher temperatures? Well, like many things in engineering, the answer is “it depends.” But, it depends in very interesting ways. Poor mechanical designs built by server manufactures who think a mechanical engineers are a waste of money, will be able to run perfectly well at 95F.  Even I’m a good enough mechanical engineer to pass this bar. The trick is to put a LARGE fan in the chassis and move lots of air. This approach is very inefficient and wastes much power but it’ll work perfectly well at cooling the server. The obvious conclusion is that points 1 and 2 above really don’t matter. We clearly CAN use 95F approach air to cool servers and maintain them at the same temperature they run today which eliminates server mortality issues and potential semi-conductor leakage issues. But, eliminating these two issues with a sloppy mechanical design will be expensive and waste much power.


A well designed server with careful part placement, good mechanical design, and careful impeller selection and control will perform incredibly differently from a poor design. The combination of good mechanical engineering and intelligent component selection can allow a server to run at 95F at a nominal increase in power due to higher air movement requirements. A poorly designed system will be expensive to run at elevated temperatures. This is a good thing for the server industry because it’s a chance for them to differentiate and compete on engineering talent rather than all building the same thing and chasing the gray box server cost floor.


In past postings, I’ve said that server purchases should be made on the basis of work done per dollar and work done per joule (see slides at Measure work done using your workload or a kernel of your workload or a benchmark you feel is representative of your work load.  When measuring work done per dollar and work done joule (one watt for one second), do it at your planned data center air supply temperature. Higher temperatures will save you big operational costs and, at the same time, measuring and comparing servers at high temperatures will show much larger differentiation between server designs.  Good servers will be very visibly better than poor designs. And, if we all measure work done joule (or just power consumption under load) at high inlet temperatures, we’ll quickly get efficient servers that run reliably at high temperature.


Make the server suppliers compete for work done per joule at 95F approach temperatures and the server world will evolve quickly. It’s good for the environment and is perhaps the largest and easiest to obtain cost reduction on the horizon.




James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Tuesday, May 05, 2009 6:53:55 AM (Pacific Standard Time, UTC-08:00)  #    Comments [13] - Trackback
 Sunday, May 03, 2009

Chris Dagdigian of BioTeam presented the keynote at this year’s Bio-IT World Conference. I found this presentation interesting for at least two reasons: 1) it’s a very broad and well reasoned look at many of the issues in computational science and, 2) an innovative example of cloud computing is presented  where BioTeam and Pfizer implement protein docking using Amazon AWS.


The presentation is posted at: and I summarize some of what caught my interest below:

·         Argues that virtualization is “still the lowest hanging fruit in most shops” yielding big gains for operators, users, the environment, and budgets

·         Storage:

o   Storage still cheap and getting cheaper but operational costs largely unchanged

o   Data Triage needed: volume of data production is outpacing declining fully burdened cost of storage (including operational costs)

o   Lessons learned from a data loss event (10+TB lost)

§  Double disk failure on RAID5 volume holding SAN FS metadata with significant operational errors

§  Need more redundancy than RAID5

§  Need SNMP and email error reporting

§  Need storage subsystems to actively scrub, verify, and correct  errors

o   Concludes the storage discussion by pointing out that cloud services offer excellent fully burdened storage costs

·         Utility Computing

o   It is expensive to design for peak demand in-house

o   Pay-as-you-go can be compelling for some workloads

o   Explained why he “drank the Amazon EC2 Kool-Aid: saw it, used it, solved actual customer problems with it. As an example, Chris looked at a protein docking project done by Pfizer & BioTeam.

·         Protein Docking project architecture:

o   Borrows heavily from Rightscale Grid Edition

o   Inbound and outbound in Amazon SQS

o   Job specification in JSON

o   Data stored in Amazon S3

o   Job provenance and metadata stored in SimpleDB

o   Worker instances dynamic spawned in EC2 where structures are scored

o   All results stored in S3 (EC2 <-> S3 bandwidth free)

o   Download the top ranked docked complexes

o   Launch post-processing EC2 instances to score, rank, filter,  and cluster results into S3 (bring the computation do the data)

·         Don’t want to belittle the security concerns but whiff hypocrisy in the air

o   Is your staff really concerned or just protecting their turf

o   It is funny to see people demanding security measures they don’t practice internally across their own infrastructure

·         Next-Gen & utility storage

o   Primary analysis onsite; data moved to remote utility storage service after passing QC tests

o   Data would rarely (if ever) move back

o   Need to reprocess or rerun?

§  Spin up cloud servers to re-analyze in situ

§  Terabyte data transit not required


James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Sunday, May 03, 2009 8:58:20 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1] - Trackback
 Wednesday, April 29, 2009

In the Randy Katz on High Scale Data Centers posting I the article brought up Google Dalles.  The article reported that Dalles used air side economization but I’ve not seen the large intakes or louvers I would expect from a facility of that scale. 


Cary Roberts, ex-TellMe Networks and all around smart guy, produced a picture of Google Dalles that clearly shows air side economization (Thanks Cary).


James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Wednesday, April 29, 2009 1:46:27 PM (Pacific Standard Time, UTC-08:00)  #    Comments [4] - Trackback
 Tuesday, April 28, 2009

Earlier this week I got a thought provoking comment from Rick Cockrell in response to the posting: 32C (90F) in the Data Center. I found the points raised interesting and worthy of more general discussion so I pulled the thread out from the comments into a separate blog entry. Rick posted:


Guys, to be honest I am in the HVAC industry. Now, what the Intel study told us is that yes this way of cooling could cut energy use, but what is also said is that there was more than a 100% increase in server component failure in 8 months (2.45% to 4.46%) over the control study with cooling... Now with that said if anybody has been watching the news lateley or Wall-e, we know that e-waste is overwhlming most third world nations that we ship to and even Arizona. Think?

I see all kinds of competitions for energy efficiency, there should be a challenge to create sustainable data center. You see data centers use over 61 billion kWh annually (EPA and DOE), more than 120 billion gallons of water at the power plant (NREL), more than 60 billion gallons of water onsite (BAC) while producing more than 200,000 tons of e-waste annually (EPA). So for this to be a fair game we can't just look at the efficiency. It's SUSTAINABILITY!

It would be easy to just remove the mechanical cooling (I.E. Intel) and run the facility hotter, but the e-waste goes up by more than 100% (Intel Report and Fujitsu hard drive testing), It would be easy to not use water cooled equipment, to reduce water onsite use but the water at the power plant level goes up, as well as the energy use. The total solution has to be a solution of providing the perfect environment, the proper temperatures, while reducing e-waste.

People really need to do more thinking and less talking. There is a solution out there that can do almost everything that needs to be done for the industry. You just have to look! Or maybe call me I'll show you.


Rick, you commented that “it’s time to do more thinking and less talking” and argued that the additional server failures seen in the Intel report created 100% more ewaste so simply wouldn’t make sense. I’m willing to do some thinking with you on this one.


I see two potential issues with your assumption.  The first that the Intel report showed “100% more ewaste”. What they saw in a 8 rack test is server mortality rate of 4.46% whereas their standard data centers were 3.83%. This is far from double and with only 8 racks may not be statistically significant. Further evidence that the difference may not be significant we see that the control experiment where they had 8 racks in the other half of the container running on DX cooling showed failure rates of 2.45%.  It may be noise given that the control differed from the standard data center by about as much as test data set. And, it’s a small sample.


Let’s assume for a second that the increase in failure rates actually was significant. Neither the investigators or I are convinced this is the case but let’s make the assumption and see where it takes us.  They have 0.63% more than their normal data centers and 2.01% more than the control.  Let’s take the 2% number and think it through assuming these are annualized numbers. The most important observation I’ll make is that 85% to 90% of servers are replaced BEFORE they fail which is to say that obsolescence is the leading cause of server replacement. They no longer are power efficient and get replaced after 3 to 5 years.  If I could save 10% of the overall data center capital expense and 25%+ of the operating expense at the cost of having an additional 2% in server failures each year. Absolutely yes.  Further driving this answer home, Dell, Rackable, and ZT Systems will replace early failures if run under 35C (95F) on warranty.


So, the increased server mortality rate is actually free during the warranty period but let’s ignore that and focus on what’s better for the environment.  If 2% of the servers need repair early and I spend the carbon footprint to buy replacement parts but saving 25%+ of my overall data center power consumption, is that a gain for the environment?  I’ve not got a great way to estimate true carbon footprint of repair parts but it sure looks like a clear win to me.


On the basis of the small increase in server mortality weighed against the capital and operating expense savings, running hotter looks like a clear win to me. I suspect we’ll see at least a 10F average rise over the next 5 years and I’ll be looking for ways to make that number bigger. I’m arguing it’s a substantial expense reduction and great for the environment.




James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Tuesday, April 28, 2009 8:01:18 AM (Pacific Standard Time, UTC-08:00)  #    Comments [20] - Trackback
 Saturday, April 25, 2009

This IEEE Spectrum article was published in February but I’ve been busy and haven’t had a chance to blog it. The author, Randy Katz, is a UC Berkeley researcher and member of the Reliable Available Distributed Systems Lab. Katz was a coauthor on the recently published RAD Lab article on Cloud Computing: Berkeley Above the Clouds.


The IEEE Spectrum article focuses on data center infrastructure: Tech Titans Building Boom.  In this article Katz, looks at the Google, Microsoft, Amazon, and Yahoo data center building boom. Some highlights from my read:

·         Microsoft Quincy is 48MW total load with 48,600 sq m of space.  4.8 km of chiller pipe, 965 km of electrical wire, 92,900 m2 of drywall, and 1.5 metric tons of backup batteries.

·         Yahoo Quincy, is somewhat smaller at 13,000 m2. This not yet complete facility will include free air cooling.

·         Google Dalles is a two building facility on the Columbia river, each at 6,500 m2.  I’ve been told that this facility does make use air-side economization but in carefully studying all pictures I’ve come across I can’t find air intakes or louvers so I’m skeptical. From the outside the facilities look fairly conventional.

·         Google is also building in Pryor, Okla.; Council Bluffs, Iowa; Lenoir, N.C.; and Goose Creek, S.C.

·         Arial picture of Google Dalles:

·         McKinsey estimates that the world has 44M servers and that they consume 0.5% of all electricity and produce 0.2% of all carbon dioxide. However, in a separate article McKinsey also speculates that Cloud Computing may be more expensive for enterprise customers, a claim that most of the community had trouble understanding or finding data to support.

·         Google uses conventional multicore processors. To reduce the machines’ energy appetite, Google fitted them with high-efficiency power supplies and voltage regulators, variable-speed fans, and system boards stripped of all unnecessary components like graphics chips. Google has also experimented with a CPU power-management feature called dynamic voltage/frequency scaling. It reduces a processor’s voltage or frequency during certain periods (for example, when you don’t need the results of a computing task right away). The server executes its work more slowly, thus reducing power consumption. Google engineers have reported energy savings of around 20 percent on some of their tests. For more recently released data on Google’s servers, see Data Center Efficiency Summit (Posting #4).

·         Katz reports that average data center is 14C and that newer centers are pushing to 27C. I’m interested in going to 35C and eliminating process based cooling: Data Center Efficiency Best Practices.

·         Containers: T he most radical change taking place in some of today’s mega data centers is the adoption of containers to house servers. Instead of building raised-floor rooms, installing air-conditioning systems, and mounting rack after rack, wouldn’t it be great if you could expand your facility by simply adding identical building blocks that integrate computing, power, and cooling systems all in one module? That’s exactly what vendors like IBM, HP, Sun Microsystems, Rackable Systems, and Verari Systems have come up with. These modules consist of standard shipping containers, which can house some 3000 servers, or more than 10 times as many as a conventional data center could pack in the same space. Their main advantage is that they’re fast to deploy. You just roll these modules into the building, lower them to the floor, and power them up. And they also let you refresh your technology more easily—just truck them back to the vendor and wait for the upgraded version to arrive.

·         Microsoft Chicago will have 200 containers in its lower floor (it’s a two floor facility) and it’s expected to be well over 45MW and will be 75MW if built out to the full 200 containers planned (First Containerized Data Center Announcement). The Chicago, Dublin, and Des Moines facilities have all been delayed by Microsoft presumably due to economic conditions: Microsoft Delays Chicago, Dublin, and Des Moines Data Centers.


Check out Tech Titans Building Boom:




James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Saturday, April 25, 2009 6:40:10 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
 Monday, April 20, 2009

I’m always interested in research on cloud service efficiency, and last week, at the Uptime Institute IT Symposium in New York City, management consultancy McKinsey published a report entitled Clearing the air on Cloud Computing. McKinsey is a well respected professional services company that describes itself as “a management consulting firm advising leading companies on organization, technology, and operations”.  Over the first 22 years of my career in server-side computing at Microsoft and IBM, I’ve met McKinsey consultants frequently, although they were typically working on management issues and organizational design rather than technology. This particular report focuses more on technology, where the authors investigate the economics of very high scale data centers and cloud computing. This has been my prime area of interest for the last 5 years, and my first observation is the authors are taking on an incredibly tough challenge.


Gaining a complete inventory of the costs of internal IT is very difficult. The costs hide everywhere.  Some are in central IT teams, some are in central procurement groups, some with the legal and contract teams, and some in departmental teams doing IT work although not part of corporate IT. It’s incredibly difficult to get a full, accurate, and unassailable inventory into the costs of internal IT. Further complicating the equation, internal IT is often also responsible for mission-critical tasks that have nothing to do with comparing internal IT with cloud services offerings. Internal IT is often responsible for internal telco and for writing many of the applications that actually run the business.  Basically, it’s very hard to first find all the comparable internal IT costs and, even with a complete inventory of IT costs, it’s then even harder to separate out mission-critical tasks that internal IT teams own that have nothing to do with whether the applications are cloud or internally hosted. I’m arguing that this report’s intent, of comparing costs in a generally applicable way, across all industries, is probably not possible to do accurately and may not be a good idea.


In the report, the authors conclude that current cloud computing offerings “are not cost-effective compared to large enterprise data centers.”  They argue that cloud offerings are most attractive for small and medium sized enterprises. The former is a pretty strong statement, and contradicts most of what I’ve learned about high scale service, so it’s definitely worth digging deeper.


It’s not clear that a credible detailed accounting of all comparable IT costs that generalizes across all industries can be produced. Each company is different and these costs are both incredibly hard to find and entangled with many other mission-critical tasks the internal IT team owns that has nothing to do with whether they are internally hosted or utilizing the cloud. From all the work I’ve done around high scale services, it’s inarguably true that some internal IT tasks are very leveraged. These tasks form the core competency of the business and are usually at least developed internally if not hosted internally.  In what follows, I’ll argue that non-differentiated services -- services that need to be good but aren’t the company’s competitive advantage -- are much more economically hosted in very high-scale cloud computing environments. The hosting decision should be driven by company strategy and a decision to concentrate investment capital where it has the most impact. The savings available using a shared cloud for non-differentiated services are dramatic, and are available for all companies, from the smallest startup to the largest enterprise. I’ll look at some of these advantages below.


In this report the authors conclude that cloud computing makes sense for small and medium enterprises but “are not cost-effective to large enterprise data centers.” The authors argue there are economies of scale that makes sense for the small and medium sized businesses, but the cost advantages break down at the very large. Essentially they are arguing that big companies already have all the economies of scale available to internet-scale services. On the face, this appears unlikely. And, upon further digging, we’ll see it’s simply incorrect across many dimensions.


Let’s think about economies of scale.  Large power plants produce lower cost power than small regional plants.  Very large retail store chains spend huge amounts on optimizing all aspects of their businesses from supply chain optimization through customer understanding and, as a consequence, can offer lower prices. There are exceptions to be sure but, generally, we see a pretty sharp trend towards economies of scale across a wide range of businesses.  There will always be big, dumb, poorly run players and there will always be nimble but small innovators.  The one constant is those that understand how to grow large and get the economies of scale and yet still stay nimble, often deliver very high quality products at much lower cost to the customer.


Perhaps the economies of scale don’t apply to the services world?  Looking at services such as payroll and internal security, we see that almost no companies choose to do their own internally.  These services clearly need to be done well, but they are not differentiated.  It’s hard to be so good at payroll that it yields a competitive advantage, unless your company is actually specializing in payroll. Internal operations such as payroll and security are often sublet to very large services companies that focus on them. ADP, for example, has been successful at providing a very high scale service that makes sense for even the biggest companies. I actually think it’s a good thing that the companies I’ve worked for over the last twenty years didn’t do their own payroll and instead focus their investment capital on technology opportunities that grow the business and help customers. It’s the right answer.


We find another example in enterprise software.  When I started my career, nearly all large companies developed their own internal IT applications. At the time, most industry experts speculated that none of the big companies would ever move to packaged ERP systems. But, the economies of scale of the large ERP development shops are substantial and, today, very few companies develop their own ERP or CRM systems.  The big companies like SAP can afford to invest in the software base at rates even the largest enterprise couldn’t afford. Fifteen years ago SAP had 4,200 engineers working on their ERP system. Even the largest enterprise could never economically justify spending a fraction of that.  Large central investments at scale typically make better economic sense unless the system in question is one of a company’s core strategic assets.


I’ve argued that smart, big players willing to invest deeply in innovating at scale can produce huge cost advantages and we’ve gone through examples from power generation, through retail sales, payroll, security, and even internal IT software. The authors of the McKinsey study are essentially arguing that, although all major companies have chosen to enjoy the large economies of scale offered by packaged software products over internal development, this same trend won’t extend to cloud hosted solutions. Let’s look closely at the economics to see if this conclusion is credible.


In the enterprise, most studies report that the cost of people dominates the cost of servers and data center infrastructure.  In the cloud services world, we see a very different trend.  Here we find that the costs of servers dominate, followed by mechanical systems, and then power distribution (see the Cost of Power in Large Data Centers). As an example, looking at all aspects of operational costs in a mid-sized service led years ago, the human administrative costs were under 10% of the overall operational costs.  I’ve seen very large, extremely well run services where the people costs have been driven below 4%. Given that people costs dominate many enterprise deployments, how do high-scale cloud services get these cots so low? There are many factors contributing but the most important two are 1) cloud services run at very high scale and can afford to invest more in automation amortizing that investment across a much larger server population, and 2) services teams can specialize focused on doing one thing and doing it very well. This kind of specialization yields efficiency gains, but it is only affordable at multi-tenant scale. The core argument here is that the number 1 cost in the enterprise is people whereas, in high scale services, these costs have been amortized down to sub-10%. Arguing there are no economies at cloud scale is the complete opposite of my experience and observations.


<JRH>Page 25 of study shows a “disguised client example“ where the example company had 1,704 people working in IT before the move to cloud services and still required 1,448 after the move. I’m very skeptical that any company with 1,704 people working in IT – clearly a large company – would move to cloud computing in one, single discrete step.  It’s close to impossible and would be foolhardy.  Consequently, I suspect the data either represents a partial move to the cloud or is only a paper exercise. If the former, the data is incomplete and, if the later, the data is speculative.  The story is clouded further by including in the headcount inventory desktop support, real estate, telecommunications and many other responsibilities that wouldn’t be impacted by the move to cloud services. Adding extraneous costs in large numbers dilutes the savings realized by this disguised customer. Overall, this slide doesn’t appear informative.


We’ve shown that at very high scale the dominant costs are server hardware and data center infrastructure. Very high scale services hire server designers and have an entire team focused on the acquisition of some of the most efficient server designs in the world.  Google goes so far as to design custom servers (see Jeff Dean on Google Infrastructure) something very hard to economically do at less than internet-scale.  I’ve personally done joint design work with Rackable Systems in producing servers optimized for cloud services workloads (Microslice Servers). When servers are the dominant cost and you are running at 10^5 to 10^6 servers scale, considerable effort can and should be spent on obtaining the most cost effective servers possible for the workload. This is hard to do economically at lower scale.


We’ve shown that people costs are largely automated out of very high scale services and that the server hardware is either custom, jointly developed, or specifically targeted to the workload.  What about data center infrastructure?  The Uptime Institute reports that the average data center Power Usage Effectiveness is 2.0 (smaller is better). What this number means is that for every 1W of power that goes to a server in an enterprise data center, a matching watt is lost to power distribution and cooling overhead. Microsoft reports that its newer designs are achieving a PUE of 1.22 (Out of the box paradox…). All high scale services are well under 1.7 and most, including Amazon, are under 1.5. High scale services can invest much more in infrastructure innovations by spreading this large investment out over a large number of data centers. As a consequence, these internet-scale services are a factor of 2 more efficient than the average enterprise. This is good for the environment and, with power being such a substantial part of the cost of high-scale computing, it substantially reduces costs as well.


Utilization is the factor that many in the industry hate talking about because the industry-wide story is so poor.  The McKinsey report says that enterprise server utilization is actually down around 10% which is approximately consistent with I’ve seen working with enterprise customers over the years. The implication is the servers and the facilities that house them are only 10% used.  This sounds like the beginning of an incredibly strong argument for cloud services but the authors take a different path and argue it would be easy to increase enterprise utilization far higher than 10%. With an aggressive application of virtualization and related technologies, they feel utilizations as high as 35% are possible.  That conclusion is possibly correct, but it’s worth spending a minute on this point. At 35% efficiency, a full 2/3 is still wasted which seems unfortunate, unnecessary, and hard on the environment.  Improving from 10% to 35% will require time, new software, new training, etc. but it may be possible.  What’s missing in this observation is that 1) cloud services can invest more in these efficiency innovations and they are already substantially down that path, 2) large user populations allow a greater investment in infrastructure efficiency at a higher rate, and 3) not all workloads have correlated peaks, so larger, heterogeneous populations offer substantially larger optimization possibilities than most enterprises can achieve alone (see: resource consumption shaping).


In the discussion above, we focused on the costs “below” the software (data center infrastructure and servers) and found a substantial and sustainable competitive advantage in high scale deployments.  Looking at people costs, we see the same advantage again.  On the software-side, the cost picture ranges from less in the cloud to equal but it isn’t higher. There doesn’t seem to be a dimension that supports the claim of this report. I just can’t find the data to support the claim that enterprises shouldn’t consider cloud service deployments. Looking at slides on the McKinsey presentation that make the cost argument in detail, the graphs on slides 22, 23, and 24 just don’t make sense to me. I’ve spent considerable time on the data but just can’t get it to line up with the AWS price sheet or any other measure of reality.  The limitation might be mine but it seems others are having trouble matching this data to reality as well.


My conclusion: any company not fully understanding cloud computing economics and not having cloud computing as a tool to deploy where it makes sense is giving up a very valuable competitive edge. No matter how large the IT group, if I led the team, I would be experimenting with cloud computing and deploying where it make sense.  I would want my team to know it well and to be deploying to the cloud when the work done is not differentiated or when the capital was better leveraged elsewhere


IT is complex and a single glib answer is almost always wrong.  My recommendation is to start testing and learning about cloud services, to take a closer look at your current IT costs, and to compare the advantages of using a cloud service offering with both internal hosting and mixed hosting models.




James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Monday, April 20, 2009 4:53:21 PM (Pacific Standard Time, UTC-08:00)  #    Comments [15] - Trackback
 Saturday, April 18, 2009

In Where SSDs Don’t Make Sense in Server Applications, we looked at the results of a HDD to SSD comparison test done by the Microsoft Cambridge Research team.  Vijay Rao of AMD recently sent me a pointer to an excellent comparison test done by AnandTech. In SSD versus Enterprise SAS and SATA disks, Anandtech compares one of my favorite SSDs, the Intel X25-E SLC 64GB, with a couple of good HDDs. The Intel SSD can deliver 7000 random IOPS/sec and the 64GB component is priced in the $800 range.


The full AnandTech comparison is worth reading but I found the pricing with sequential and random I/O performance data is particularly interesting. I’ve brought this data together into the table below:






$/Seq Read ($/MB/s)

$/Seq Write $/MB/s)

Seq I/O Density

$/Rdm  Read ($/MB/s)

$/Rdm  Write ($/MB/s)

Rdm I/O Density


Intel X25-E SLC











Cheetah 15k
























All I/O measurements obtained using SQLIO

Random I/O measurements using 8k pages

Sequential measurements using 64kB I/Os

I/O density is average of read and write performance divided by capacity

Price calculations based upon average of selling price range listed.

Source: Anandtech (


Looking at this data in detail, we see the Intel SSD produces extremely good Random I/O rates but we should all know that raw performance is the wrong measure. We should be looking at dollars per unit performance. By this more useful metric, the Intel SSD continues to look very good at $17.66 $/MB/s on 8K read I/Os whereas the HDDs are $142 and $195 $/MB/s respectively.  For hot random workloads, SSDs are a clear win.


What do I mean by “hot random workloads”? By hot, I mean a high number of random IOPS per GB. But, for a given storage technology, what constitutes hot?   I like to look at I/O density which is the cutoff between a given disk with a given workload being capacity bound or I/O rate bound. For example, looking at the table above we see the random I/O density for an 64GB Intel disk is 1.109 MB/s/GB.  If you are storing data where you need 1.109 MB/s of 8k I/Os per GB of capacity or better, then the Intel device will be I/O bound and you won’t be able to use all the capacity. If the workload requires less than this number, then it is capacity bound and you won’t be able to use all the IOPS on the device. For very low access rate data, HDDs are a win. For very high access rate data, SSDs will be a better price performer.


As it turns out, when looking at random I/O workloads, SSDs are almost always capacity bound and HDDs are almost always IOPS bound.  Understanding that we can use a simple computation to compare HDD cost vs SSD cost on your workload. Take the HDD farm cost which will be driven by the number of disks needed to support the I/O rate times the cost of the disk.  This is the storage budget needed to support your workload on HDDs. Take the size of the database and divide by the SSD capacity to get the number of SSDs required. Multiple the number of SSDs required times the price of the SSD. This is the budget required to support your workload on SSDs.  If the SSD budget is less (and it will be for hot, random workloads), then SSDs are a better choice.  Otherwise, keep using HDDs for that workload.


In the sequential I/O world, we can use the same technique.  Again, we look at the sequential I/O density to understand the cut off between bandwidth bound and capacity bound for a given workload.  Very hot workloads over small data sizes will be a win on SSD but as soon as the data sizes get interesting, HDDs are a more economic solution for sequential workloads.  The detailed calculation is the same. Figure out how many HDDs required to support your workload on the basis of capacity or sequential I/O rates (depending upon which is in shortest supply for your workload on that storage technology). Figure out the HDDs budget. Then do the same for SSDs and compare the numbers. What you’ll find is that, for sequential workloads, SSDs are only best value for very high I/O rates over relatively small data sizes.


Using these techniques and data we can see when SSDs are a win for workloads with a given access pattern.  I’ve tested this line of thinking against many workloads and find that hot, random workloads can make sense on SSDs. Pure sequential workloads almost never do unless the access patterns are very hot or the capacity required relatively small.


For specific workloads that are neither pure random nor pure sequential, we can figure out the storage budget to support the workload on HDDs and on SSDs as described above and do the comparison.  Using these techniques, we can step beyond the hype and let economics drive the decision.


James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Saturday, April 18, 2009 10:19:30 AM (Pacific Standard Time, UTC-08:00)  #    Comments [5] - Trackback
 Tuesday, April 14, 2009

My notes from an older talk done by Ryan Barrett on the Google App Engine Data store at Google IO last year (5/28/2008). Ryan is a co-founder of the App Engine team.


·         App Engine Data Store is build on Big Table.

o   Scalable structured storage

o   Not a sharded database

o   Not an RDBMS (MySQL, Oracle, etc.)

o   Not a Distributed Hash Table (DHT)

o   It IS a sharded sorted array

·         Supported operations:

o   Read

o   Write

o   Delete

o   Single row transactions (optimistic concurrency control).

o   Scans:

1.       Prefix scan

2.       Range scan

·          Primary object: Entity

o   Stored in entity table

o   Each row has a name and the row name is fully qualified /root/parent/entity/child

o   Each entity has a parent or is a root entity and may have child entities

o   Primary key is the fully qualified name and this can’t change

o   An entity can’t be reparented (it can be deleted and created with a different parent)

·         Queries:

o   Queries can be filtered on kind and Ryan says kind “is like a table” (kind can be parent, child, grandparent, …)

o   Queries can be filtered on ancestor

o   Query language is GQL (presumably Google Query Language) which is a small subset of SQL

o   All queries must be expressible as range or prefix scans (no sort, orderby, or other unbounded size operations supported)

·         Secondary index implementation:

o   Indexes are also implemented as BigTable tables

o   Kind Index:

·         Contents: (kind, key)

o   Single property index:

·         Coentents: (kind, name, value)

·         Two copies of this index maintained: 1) ascending, and 2) descending

o   Composite indexes:

·         Contents: (kind, value, value)

·         Supports multi-property indexes

·         Built on programmer request but not on use (query returns error if required doesn’t exist)

·         Programmer can specify what composite indexes are needed in index.yaml

·         SDK creates composit index specs automatically in index.yaml as queries are run

·         Entity group

o   Supports multi-entity update

·         Defined by root entity (all entities under a root are an entity group)

·         All journaling and transactions done at root

·         Text and Blobs:

o   Not indexed. All other properties are


James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Tuesday, April 14, 2009 5:28:35 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
 Sunday, April 12, 2009

All new technologies go through an early phase when everyone initially is completely convinced the technology can’t work. Then for those that actually do solve interesting problems, they get adopted in some workloads and head into the next phase. In the next phase, people see the technology actually works well for some workloads and they generalize this outcome to a wider class of workloads. They get convinced the new technology is the solution for all problems. Solid State Disks (SSDs) are now clearly in this next phase. 


Well intentioned people are arguing emphatically that SSDs are great because they are “fast”.   For the most part, SSDs actually are faster than disks both in random reads, random writes and sequential I/O. I say “for the most part” since some SSDs have been incredibly bad at random writes. I’ve seen sequential write rates as low as ¼ that of magnetic HDDs but Gen2 SSD devices are now far better. Good devices are now delivering faster than HDD results across random read, write, and sequential I/O. It’s no longer the case that SSDs are “only good for read intensive workloads”.   


So, the argument that SSDs are fast is now largely true but “fast” really is a misleading measure. Performance without cost has no value.  What we need to look at is performance per unit cost.  For example, SSD sequential access performance is slightly better than most HDDs but the cost MB/s is considerably higher. It’s cheaper to obtain sequential bandwidth from multiple disks than from a single SSD.  We have to look at performance per unit cost rather than just performance.  When you hear a reference to performance as a one dimensional metric, you’re not getting a useful engineering data point.


When do SSDs win when looking at performance per unit dollar on the server?  Server workloads requiring very high IOPS rates per GB are more cost effective on SSDs.  Online transaction systems such as reservation systems, many ecommerce systems, and anything with small, random reads and writes can run more cost effectively on SSDs. Some time back I posted When SSDs make sense in server applications and the partner post When SSDs make sense in client applications. What I was looking at is where SSDs actually do make economic sense. But, with all the excitement around SSDs, some folks are getting a bit over exuberant and I’ve found myself in several arguments where smart people are arguing that SSDs make good economic sense in applications requiring sequential access to sizable databases. They don’t.


It’s time to look at where SSDs don’t make sense in server applications.  I’ve been intending to post this for months and my sloth has been rewarded.  The Microsoft Research Cambridge team recently published Migrating Server Storage to SSDs: Analysis of Tradeoffs and the authors save me some work by taking this question on. In this paper the authors look at three large server-side workloads:

1.       5000 user Exchange email server

2.       MSN Storage backend

3.       Small corporate IT workload


The authors show that these workloads are far more economically hosted on HDDs and I agree with their argument.  They conclude:


…across a range of different server workloads, replacing disks by SSDs is not a cost effective option at today’s price. Depending on the workload, the capacity/dollar of SSDs needs to improve by a factor of 3 – 3000 for SSDs to replace disks. The benefits of SSDs as an intermediate caching tier are also limited, and the cost of provisioning such a tier was justified for fewer than 10% of the examined workloads


They have shown that SSDs don’t make sense across a variety of server-side workloads.  Essentially that these workloads are more cost effectively hosted on HDDs. I don’t quite agree with the generalization of this argument that SSDs don’t make sense on the server-side for any workloads. They remain a win for very high IOPS OLTP databases but it’s fair to say that these workloads are a tiny minority of server-side workloads. The right way to make the decision is to figure out the storage budget for the workload to be hosted on HDD and compare that with the budget to support the workload on SSDs and make the decision on that basis.  This paper argues that the VAST majority of workloads are more economically hosted on HDDs.


Thanks to Zach Hill who sent this my way.




James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Sunday, April 12, 2009 8:31:05 AM (Pacific Standard Time, UTC-08:00)  #    Comments [10] - Trackback
 Thursday, April 09, 2009

Last week I attended the Data Center Efficiency Summit hosted by Google. You’ll find four posting on various aspects of the summit at:


Two of the most interesting videos:

·         Modular Data Center Tour:

·         Data Center Water Treatment Plant:


A Cnet article with links to all the videos:


The presentation I did on Data Center Efficiency Best Practices is up at:




James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Thursday, April 09, 2009 7:18:35 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
 Tuesday, April 07, 2009

In the talk I gave at the Efficient Data Center Summit, I note that the hottest place on earth over recorded history was Al Aziziyah Libya in 1922 where 136F (58C) was indicated (see Data Center Efficiency Summit (Posting #4)). What’s important about this observation from a data center perspective is that this most extreme temperature event ever, is still less than the specified maximum temperatures for processors, disks, and memory.  What that means is that, with sufficient air flow, outside air without chillers could be used to cool all components in the system. Essentially, it’s a mechanical design problem. Admittedly this example is extreme but it forces us to realize that 100% free air cooling possible.  Once we understand that it’s a mechanical design problem, then we can trade off the huge savings of higher temperatures against the increased power consumption (semiconductor leakage and higher fan rates) and potentially increased server mortality rates.


We’ve known for years that air side economization (use of free air cooling) is possible and can limit the percentage of time that chillers need to be used. If we raise the set point in the data center, chiller usage falls quickly.  For most places on earth, a 95F (35C) set point combined with free air cooling and evaporative cooling are sufficient to eliminate the use of chillers  entirely.


Mitigating the risk of increased server mortality rates, we now have manufacturers beginning to warrant there equipment to run in more adverse conditions. Rackable Systems recently announced that CloudRack C2 will support full warrantee at 104F (40C): 40C (104F) in the Data Center. Ty Schmitt of Dell confirms that all Dell servers are warranted at 95F (35C) inlet temperatures. 


I recently came across a wonderful study done by the Intel IT department (thanks to Data Center Knowledge): reducing data center cost with an Air Economizer.


In this study Don Atwood and John Miner of Intel IT take the a datacenter module and divide it up into two rooms of 8 racks each. One room is run as a control with re-circulated air the their standard temperatures. The other room is run on pure outside air with the temperature allowed to range between 65F and 90F.  If the outside temp falls below 65, server heat is re-circulated to maintain 65F. If over 90F, then the air conditioning system is used to reduced to 90F.  The servers ran silicone design simulations at an average utilization rate of 90% for 10 months.



The short summary is that the server mortality rates were marginally higher – it’s not clear if the difference is statistical noise or significant – and the savings were phenomenal. It’s only four pages and worth reading:


We all need to remember that higher temperatures mean less engineering headroom and less margin for error so care needs to be shown when raising temperatures. However, it’s very clear that its worth investing in the control systems and processes necessary for high temperature operation. Big savings await and it’s good for the environment.




James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Tuesday, April 07, 2009 2:27:45 PM (Pacific Standard Time, UTC-08:00)  #    Comments [8] - Trackback

 Sunday, April 05, 2009

Last week, Google hosted the Data Center Efficiency Summit.  While there, I posted a couple of short blog entries with my rough notes:

·         Data Center Efficiency Summit

·         Rough Notes: Data Center Efficiency Summit

·         Rough Notes: Data Center Efficiency Summit (posting #3)


In what follows, I summarize the session I presented and go into more depth on some of what I saw in sessions over the course of the day.


I presented Data Center Efficiency Best Practices at the 1pm session.  My basic point was that PUEs in the 1.35 range are possible and attainable without substantial complexity and without innovation.  Good solid design, using current techniques, with careful execution is sufficient to achieve this level of efficiency.


In the talk, I went through power distribution from high voltage at the property line to 1.2V at the CPU and showed cooling from the component level to release into the atmosphere. For electrical systems, the talk covered an ordered list of rules to increase power distribution efficiency:

1.       Avoid conversions (Less transformer steps & efficient or no UPS)

2.       Increase efficiency of conversions

3.       High voltage as close to load as possible

4.       Size voltage regulators (VRM/VRDs) to load & use efficient parts

5.       DC distribution potentially a small win (regulatory issues)

Looking at mechanical systems, the talk pointed out the gains to be had by carefully moving to higher data center temperatures.  Many server manufacturers including Dell and Rackable will fully stand behind their systems at inlet temperatures as high as 95F. Big gains are possible via elevated data center temperatures. The ordered list of mechanical systems optimizations recommended:

1.       Raise data center temperatures

2.       Tight airflow control, short paths, & large impellers

3.       Cooling towers rather than chillers

4.       Air-side economization & evaporative cooling


The slides from the session I presented are posted at:


Workshop Summary:

The overall workshop was excellent. Google showed the details behind 1) the modular data center they did 4 years ago showing both the container design and the that of the building that houses them, 2) the river water cooling system employed in their Belgium data center. And 3) the custom Google-specific server design.


Modular DC: The modular data center was a 45 container design where each container was 222KW (roughly 780W/sq ft). The containers were housed in a fairly conventional two floor facility.  Overall, it was nicely executed but all Google data centers built since this one have been non-modular and each subsequent design has been more efficient than this one. The fact that Google has clearly turned away from modular designs is interesting.  My read is that the design we were shown missed many opportunities to remove cost and optimize for the application of containers.  The design chosen essentially built a well executed but otherwise conventional data center shell using standard power distribution systems and standard mechanical systems. No part of the building itself optimized for containers.  Even though it was a two level design, rather than just stacking containers, a two floor shell was built. A 220 ton gantry crane further drove up costs but the crane was not fully exploited by packing the containers in tight and stacking them. 


For a containerized model to work economically, the attributes of the container need to be exploited rather than merely installing them in a standard data center shell. Rather than building an entire facility with multiple floors, we would need to use a much cheaper shell if any at all. The ideal would be a design where just enough concrete is poured to mount four container mounting bolts so they can be tied down to avoid wind damage. I believe the combination of not building a full shell, the use of free air cooling, and the elimination of the central mechanical system would allow containerized designs to be very cost effective. What we learn from the Google experiment is that a the combination of a conventional data center shell and mechanical systems with containers works well (their efficiency data shows it to be very good) but isn’t notably better than similar design techniques used with non-containerized designs.


River water cooling: The Belgium river water cooled data center caught my interest when it was first discussed a year ago.  The Google team went through the design in detail. Overall, it’s beautiful work but included a full water treatment plant to treat the water before using it.  I like the design in that its 100% better both economically and environmentally to clean and use river water rather than to take fresh water from the local utility.  But, the treatment plant itself represents a substantial capital expense and requires energy for operation. It’s clearly an innovative way to reduce fresh water consumption. However, I slightly prefer designs that depend more deeply on free air cooling and avoid the capital and operational expense of the water treatment plant.


Custom Server: The server design Google showed was clearly a previous generation. It’s a 2005 board and I strongly suspect there exist subsequent designs at Google that haven’t yet been shown publically.  I fully support this and think showing publically the previous generaion design is a great way to drive innovation inside a company while contributing to the industry as a whole.  I think it’s a great approach and the server that was shown last Wednesday was a very nice design.


The board is a 12volt only design.  This has been come more common of late with IBM, Rackable, Dell and others all doing it.  However, when the board was first designed, this was considerably less common.  12V only supplies are simpler, distributing on-board the single voltage is simpler and more efficient, and distribution losses are lower at 12v than either 3.3 or 5 for a given sized trace. Nice work.


Perhaps the most innovative aspect of the board design is the use of a distributed UPS. Each board has a 12V VRLA battery that can keep the server running  for 2 to 3 minutes during power failures. This is plenty of time to ride through the vast majority of power failures and is long enough to allow the generators to start, come on line, and sync.  The most important benefit of this design is it avoids the expensive central UPS system. And, it also avoids the losses of the central UPS (94% to 96% efficient UPSs are very good and most are considerably worse). Google reported their distributed UPS was 99.7% efficient. I like the design.


The motherboard was otherwise fairly conventional with a small level of depopulation. The second Ethernet port was deleted as was USB and other components. I like the Google approach to server design.


The server was designed to be rapidly serviced with the power supply, disk drives, and battery all being Velcro attached and easy to change quickly.  The board itself looks difficult to change but I suspect their newer designs will address that shortcoming.


Hat’s off to Google for organizing this conference to get high efficiency data center and server design techniques more broadly available across the industry. Both the board and the data center designs shown in detail where not Google’s very newest but all were excellent and well worth seeing. I like the approach of showing the previous generation technology to the industry while pushing ahead with newer work. This technique allows a company to reap the potential competitive advantages of its R&D investment while at the same time being more open with the previous generation.


It was a fun event and we saw lots of great work. Well done Google.




James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Sunday, April 05, 2009 3:37:12 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2] - Trackback

The HotPower ’09  workshop will be held on October 10th at the same venue and right before the Symposium on Operating Systems Principles (SOSP 2009) at Big Sky Resort Montana. Hotpower recognizes that power is becoming a central issue in the design of all systems from embedded systems to servers for high-scale data centers.


Power is increasingly becoming a central issue in designing systems, from embedded systems to data centers. We do not understand energy and its tradeoff with performance and other metrics very well. This limits our ability to further extend the performance envelope without violating physical constraints related to batteries, power, heat generation, or cooling.

HotPower hopes to provide a forum in which to present the latest research and to debate directions, challenges, and novel ideas about building energy-efficient computing systems. In addition, researchers coming to these issues from fields such as computer architecture, systems and networking, measurement and modeling, language and compiler design, and embedded systems will gain the opportunity to interact with and learn from one another.

 If you are interesting in submitting a paper to HotPower:  




James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Sunday, April 05, 2009 7:18:20 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback

Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.

<May 2009>

This Blog
Member Login
All Content © 2014, James Hamilton
Theme created by Christoph De Baene / Modified 2007.10.28 by James Hamilton