Tuesday, November 04, 2008

Tony Hoare spoke yesterday at the Computing in the 21st Century Conference in Beijing. Tony is a Turing award winner, Quicksort inventor, author of the influential Communication Sequential Processes (CSP) formal language, and long time advocate of program verification and tools to help produce reliable software systems. In his talk he argues that programming should be and can be a science and the goals should be correct programs that stay correct through change. Zero defect software. 

 

He explains that engineers will accept that there will be defects but the scientist should pursue perfection far beyond that for which there is a commercial need. Tony has spent a big part of his successful career in pursuit of techniques and tools to produce reliable complex systems.

 

Tony ended his talk on an a practical engineering note hoping that we can advance our field to the point that “Software will contain no more errors than other engineering disciplines”.  We’re not there yet.

 

My rough notes from the talk follow.

 

Title: The Science of Programming

Speaker: Tony Hoare

 

The Vision:

·         Computer software contains no more errors

o   Software is the most reliable component of any device that contains it

·         Programmers make no mistakes

o   Programs work the first time they run

o   They run forever after, even after changing

·         Programming is an engineering discipline

o   Respected for its delivered benefits and it’s foundation on basic science

·         Semantics is the science of programming

o   Explores the meaning of computer programs

o   Operational: correctness of implementation

o   Algebraic: Correctness of optimization

o   Axiomatic

The Insight:

·         Computer programs are mathematical formulae

o   They don’t suffer from rust, wear, decay, fatigue

o   If a correct program is started in a correct state, they it will stay correct

·         Their correctness is a mathematical conjecture

o   To be proved by logic and calculation

o   Checked by the computer itself

History of the idea:

·         Aristotle (350bc): Syllogistic logic

·         Euclid (300bc): geometry

·         Leibnitz (1700): calculus

·         Boole (1850): laws of thought

·         Frege (1880): predicate logic

·         Russel (1920): Principia

·         Hao Wang (1956): Computer checks

Basic Science:

·         Answers fundamental questions

·         What does it do?

·         How does it work?

·         Why does it work?

·         How do we know?

What does it do?

·         Answered by its behavioral specification

How does it work?

·         Answer by it’s internal interface contracts

Why does the program work?

·         Answered by programming theory

How do we know?

·         By logical/mathematical proof

Ideals in Basic Science

·         Pursued for the sake of scientific glory far in advance of commercial need

·         Physics: accuracy of measurement

·         Chemistry: purity of materials

·         Computing Science: zero defect programs

Unifying Theory

·         Basic science seeks unifying theories

·         Explains diverse phenomena

·         Supported by evidence

Overall, industry is not heavily using software verification along the lines that Tony wants to see but there are some in use. For example, some tools in use at Microsoft:

·         PREfix and PREfast

·         Static Driver Verifier

·         ESP (locates potential buffer overflows)

The Hope:

·         Software will contain no more errors than other engineering disciplines.

 

James Hamilton, Data Center Futures
Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | blog:http://perspectives.mvdirona.com

 

Tuesday, November 04, 2008 2:23:57 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Software
 Saturday, October 25, 2008

Service monitoring at scale is incredibly hard. I’ve long argued that you should never learn anything about a problem your service is experiencing from a customer.  How could they possibly know first when there is a service outage or issue? And, yet it happens frequently. The reason it happens is most sites don’t have close to an adequate level of instrumentation.  Without this instrumentation, you are flying blind.

 

Systems monitoring data can be used to drive alerts, to compute SLAs, to drive capacity planning, to find latencies, to understand customer access patterns, and some sites use it to drive billing although the later is probably a mistake.

 

In the rare cases where I’ve come across high quality monitoring systems that actually do fine-grained data collection, its often not looked at or underutilized.  It turns out that fully using and exploiting very large amounts of  monitoring data isn’t much easier than collecting it.

 

Returning the challenge of efficiently collecting fine grained monitoring data and events from thousands of servers, Facebook made a contribution yesterday in making Scribe available as an open source project: Facebook's Scribe technology now open source.  Scribe is used at Facebook to monitor their more than 10k servers across multiple data centers.  Scribe is a Sourceforge project at: http://sourceforge.net/projects/scribeserver/.

 

Facebook continues to both develop interesting and broadly useful software and often contributes it to the community by making it open source. For example, Facebook Releases Cassandra as Open Source.

 

Some excerpts from On Designing and Deploying Internet-Scale Services on why I think auditing, monitoring, and alerting are important

 

Alerting is an art. There is a tendency to alert on any event that the developer expects they might find interesting and so version-one services often produce reams of useless alerts which never get looked at. To be effective, each alert has to represent a problem. Otherwise, the operations team will learn to ignore them. We don’t know of any magic to get alerting correct other than to interactively tune what conditions drive alerts to ensure that all critical events are alerted and there are not alerts when nothing needs to be done. To get alerting levels correct, two metrics can help and are worth tracking: 1) alerts-to-trouble ticket ratio (with a goal of near one), and 2) number of systems health issues without corresponding alerts (with a goal of near zero).

 

·         Instrument everything. Measure every customer interaction or transaction that flows through the system and report anomalies. There is a place for “runners” (synthetic workloads that simulate user interactions with a service in production) but they aren’t close to sufficient. Using runners alone, we’ve seen it take days to even notice a serious problem, since the standard runner workload was continuing to be processed well, and then days more to know why.

 

·         Data is the most valuable asset. If the normal operating behavior isn’t well-understood, it’s hard to respond to what isn’t. Lots of data on what is happening in the system needs to be gathered to know it really is working well. Many services have gone through catastrophic failures and only learned of the failure when the phones started ringing.

 

·         Have a customer view of service. Perform end-to-end testing. Runners are not enough, but they are needed to ensure the service is fully working. Make sure complex and important paths such as logging in a new user are tested by the runners. Avoid false positives. If a runner failure isn’t considered important, change the test to one that is. Again, once people become accustomed to ignoring data, breakages won’t get immediate attention.

 

·         Instrumentation required for production testing. In order to safely test in production, complete monitoring and alerting is needed. If a component is failing, it needs to be detected quickly.

 

·         Latencies are the toughest problem. Examples are slow I/O and not quite failing but processing slowly. These are hard to find, so instrument carefully to ensure they are detected.

 

·         Have sufficient production data. In order to find problems, data has to be available. Build fine grained monitoring in early or it becomes expensive to retrofit later. The  most important data that we’ve relied upon includes:

 

Thanks to Sriram Krishnan for pointing me to the release of Scribe.

 

                                                                --jrh

James Hamilton, Data Center Futures
Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | blog:http://perspectives.mvdirona.com

 

Saturday, October 25, 2008 8:33:23 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Sunday, October 19, 2008

In When SSDs Make Sense in Server Applications, we looked at where Solid State Drives (SSDs) were practical in servers and services. On the client side, there are even more reasons to use SSDs and I expect that within three years, more than half of enterprise laptops will have NAND Flash as at least part of their storage subsystems. This estimate has SSDs in 38% of all laptops by 2011: Flash SSD in 38% of Laptops by 2011.  

 

What follows is a quick summary of SSD advantages on the client side, followed by the disadvantages, and then a closer look at the write endurance (wear-out) problem that has been the topic of much discussion recently.

 

Client SSD Advantages:

·         Random IOPS:  Laptop I/O patterns are dominated by random workloads and, as argued in When SSDs Make Sense in Server Applications, these workloads run cost effectively on SSDs

·         Low Power: SSD power  consumption is typically in the under 2W range and often under 1W. Enterprise disk can run 15 to 18W, desktop parts are typically in the 10W range but laptop drives usually run a more modest 2.5W when active.  So, on one hand this is represents an exciting reduction in storage power of a factor of 2 but, on the other, it’s actually only a 1W saving when the HDD is active and even less when idle. A savings but a small one overall. If you are interested in more data on laptop power consumption see Client-Side Power Consumption. Some very efficient HDDs actually have less idle power consumption than some SSDs so it’s not even the case that SSDs are all better under all conditions from a power consumption perspective.

·         Quiet. HDDs can be noisy. They are mechanical parts with precision bearings spinning at high speeds and they make noise.  Semi-conductor-based SSDs avoid this.

·         Small Form Factors: SSDs can be small and light weight.

·         Scale Down Floor: Disks have a price floor where further lowering the capacity of the device doesn’t save money. This price floor changes over time but, at this point, it’s hard to get much below $30 for a disk regardless of how small. The fixed costs of the mechanical parts dominate the media and the cost of the disk doesn’t scale down. SSD costs scale down well and for applications with modest storage requirements, they can be less expensive.  This makes them interesting for very low-end laptops, netPCs, ultra-mobile PCs, and, of course, NAND Flash is the storage of choice in cell phones, music players, cameras, and other related applications.

·         Shock and Vibration: HDDs usually spec max shock in the 50g to as high as 100G range and vibration in the ¼G to ½G.  SSD specs run well over 1,000G shock and around 20G vibration. The are much more durable to this common threat in the laptop world.

·         Latency: I/O latency is far lower on an SSD than a HDD and this is particularly noticeable when I/O queues get deep as they often do on single disk laptops.

·         Reliability: HDDs are the number one failing component on clients (and servers). This is particularly a problem on laptops as they are (usually) single drive devices and often not well backed up. HDD failures represent a substantial service cost in most enterprises so eliminating them is appealing.  Our operational history with SSDs is fairly short so far but we expect they will exhibit less frequent failures that hard disks.  However, like all new components, they bring additional failure modes  as well as eliminating a few.  The biggest concern around SSDs is write endurance with SLC part lifetimes typically in the range of 10^5 writes and MLC parts down around 10^4 write cycles (some even lower). We’ll look at that in more depth below.

·         Temperature: SSDs have a much wider temperatures and humidity operating range than HDDs.

 

Client SSD Disadvantages:

·         Capacity/$: Flash devices can deliver excellent random I/O performance and laptops, with only a single disks are frequently random I/O bound rather than capacity limited.  In fact, many enterprises customers actually want LESS storage on their laptop fleet. For them, having less capacity is often either not a problem or even a potential advantage. For my uses and for many consumer usage patterns, capacity remains important with pictures, audio, and other media files driving space requirements up to the point where SSDs can be tough to afford.  As a direct consequence, I expect that we’ll see more enterprise than consumer use of SSDs in clients.

·         Performance Degradation: There have been many reports of SSDs initially performing well and then degrading over time. See Laptop SSD Performance Degradation Problems for more detail.

·         Endurance: This is the most common concern I’ve heard of late with MLC write endurance only around 10,000 writes.

 

Write Endurance

I keep hearing anecdotal reports that SSDs in laptops are going to fail in the first year due to the poor write endurance of MLC SSDs. The typical MLC write endurance is usually quoted at around 10,000 cycles which I agree does sound quite low.

 

Let’s do a quick back of the envelope on MLC SSD write endurance (SLC parts are typically more expensive but have longer write endurance specifications). Assume a client system is used four hours a day and that it spends ¼ of that time at the max I/O rate of 100 IOPS.  My gut feel says this number very likely errs high.  Let’s include write amplification. Write amplification is a side effect of Flash memory designs having larger blocks as the unit of erase and smaller pages as the unit of read and programming (write).  This combined with wear leveling leads to the device having to do some overhead housekeeping writes when servicing writes from the host system. Assume an average write amplification of 3x over three years of life which again seems high.  To make it really aggressive we’ll assume a write to read ration of 1:1 (50% writes) which is very high. Finally let assume it’s a 64GB MLC device and that my writes are all to 4k pages and the overheads are all accounted for by my 3x write amplification number.

 

4*60*60*365*3*.25*.5*3*100 => 591m

 

Reading left to write, that’s 4 hours a day * 60 to get minutes * 60 to get seconds * 365 to get seconds use per year * 3 to get seconds use in three years, *25% of time at max I/O, *50% of I/Os are writes, *3 write amplification, *100 I/Os per second.

 

In aggregate, that’s about ½ billion write I/Os is needed by each laptop living three years.  But, a 64GB device has 16m pages. If you spread ½ billion writes over 16m pages with perfect wear leveling, you would have 36 write I/Os per page. Very low. With terrible wear leveling, it could go up an order of magnitude but it’s still a very low number. Move write amplification up to 5x and the wear/page still looks tiny.   Move the usage pattern up from 4 hours a day to an aggressive 16 hours a day and it’s still only 147 writes per page. Perhaps we’ll use more lifetime I/Os with an SSD than my magnetic disk model above assuming we spend less time waiting but, still, it’s not looking that big a number of lifetime writes required.

 

If we use very low endurance MLC where write endurance is specified down around 1,000 cycles rather than the more common 10,000, it’s still not a problem. But is within on order of magnitude so arguably a concern over a three year life. And it would be definitively a concern over a 5 year life.

 

Because client systems spend such a small percentage of their working lifetimes at 100% I/O rates, it’s hard to see a credible usage model that has MLC write endurance as a serious problem if using parts specified at 10,000 write cycles.

 

In a subsequent post, we’ll look back at server applications and, in contrast with When SSDs Make Sense in Server Applications, we’ll look at where SSDs don’t make sense on the server side.

 

James Hamilton, Data Center Futures
Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | blog:http://perspectives.mvdirona.com

 

Sunday, October 19, 2008 9:04:40 AM (Pacific Standard Time, UTC-08:00)  #    Comments [4] - Trackback
Hardware
 Wednesday, October 15, 2008

In past posts, I’ve talked a lot about Solid State Drives.  I’ve mostly discussed about why they are going to be relevant on the server side and the shortest form of the argument is based on extremely hot online transaction processing systems (OLTP).  There are potential applications as reliable boot disks in blade servers and other small data applications but I’m focused on high-scale OLTP in this discussion. OLTP applications random I/O bound workloads such as ecommerce systems, airline reservation systems, and any data intensive application that does lots of small reads and writes, usually on a database where future access patterns are unknown. When sizing a server for one of these workloads, the key dimension is the number of small random I/Os per second.  You need to add memory  to increase the memory hit rate and reduce the number I/Os or you need to add disks to support the application-required I/O rates.  The problem with adding memory is that it has linear cost – the last DIMM costs as much as the first DIMM – but only logarithmic value.  Because the workloads are random, adding memory only delivers a reduction in I/Os roughly proportional to the square root of the memory size. Cheap memory helps but, even then, the costs add up as does the power consumption as memory is added.  Alternatively, you can add disk but each disk added gives only another roughly 200 I/Os per second (IOPS) when using very expensive, 15k RPM disks.

 

The problem is best summarized by my favorite chart these days from Dave Patterson of Berkeley:

This chart is from an amazingly useful paper, Latency Lags Bandwidth (if you know of no-charge location for this paper, let me know).  In this chart, Dave tracks the trend of bandwidth and latency over the last 20+ years. For the purposes of this discussion ignore the latency row and focus on bandwidth. Disk bandwidth is growing slower than DRAM and CPU bandwidth.  I love looking for divergent trends in that they direct us to the more fundamental problems needing innovation.

 

Understanding disk bandwidth growth is a growing problem, let’s compare disk sequential bandwidth with random I/O rates over-time. In the chart below, I graph sequential bandwidth growth against random bandwidth growth over the same period:

 

We know that disk sequential bandwidth growth lags the rest of the system. This graph shows that random IOPS bandwidth is growing even more slowly. Across the industry, we have a huge problem and the trend lines above make it crystal clear that the problem won’t be cost-effectively solved by disk alone. More detail on one dimension of the disk limits problem in: Why Disk Speeds aren’t Increasing.

 

Disks clearly aren’t the full solution. Ever larger memory sub-systems actually are part of the solution but the logarithmic (or worse) payback with linear cost and power consumption makes memory an expensive approach if we use it as the only tool.  Many have argued for the last couple of years that solid state disks are the solution to filling the chasm between memory and disk random IOPS rates.  Jim Gray was one of the first to make this observation in: Tape is Dead, Disk is Tape, Flash is Disk, Ram Locality is King.

 

The first generation, server-side SSDs were slow random write performers but we’re now seeing great components released to the market. See 100,000 IOPS and 1,000,000 IOPS. These are great performers but they are far from commodity pricing at this point.  Intel has been doing some great work on SSDs and I really like this one: Intel X25-E Extreme SATA Solid-State Drive.  It’s a step towards commodity pricing. Overall the industry now has great performing parts available and the price/performance equation is very rapidly improving since this is a semi-conductor component rather than a mechanical one.

 

When should we expect the crossover? At what price point are SSDs a win over HDDs?  Unfortunately, it’s an application specific answer.  It depends upon I/O density of the workload, the number of I/Os per GB of data. Bob Fitzgerald has done a great job of analyzing different workloads to understand what level of application I/O heat (IOPS per GB) are needed to justify a SSD.  Building on Fitz’s work, I have a quick test you can use to figure out how cheap an SSD will have to get before it is a win in your application.

 

My observation goes like this. Disks have an abundance of capacity and are short of IOPS so, on random IOPS intensive workloads, the limiting factor using HDDs will be IOPS.  SSDs have an abundance of IOPS and are short of capacity, so the limiting factor using SSDs will be capacity.  SSDs are cost effective for your application when the cost of the disk farm adequate to support the IOPS you need is more than the SSD farm required to support the capacity you need. As a formula:

 

current#hdd * hdd$ > CapacityNeeded / Capacity_ssd * ssd$

 

Let’s try an example. This example application is hosted on several hundred database servers and it’s a red hot transaction processing system.  Each system has 53 disks of which 40 are used to store data and 8 for log and a few for admin purposes.  Leave the log on magnetic media since disks sequential bandwidth is cheaper than SSD sequential bandwidth. The database size on each server is 572GB.  The disks used by this application are 15k RPM, 3 ½ disks that price out at $333 each. Understanding this, the disk budget per server for this application is 40 * 3333 which is $13,320.  We know we need 572GB and let’s assume we are trying out 64 GB SSDs.  Using that equation, 572/64 is 8.9 so we’ll need 9 SSDs to support this workload. 

 

Taking the disk budget of $13,320 and dividing by the 9 SSDs we have computed we need, we can afford to pay up to $1,480 for each SSD. If the SSDs cost is less than this, it’s worth doing.  This model ignores the power savings (SSDS usually run under 1/5 the power of HDDs and fewer are needed) and other factors like service costs but it’s a quick check to see if SSDs are worth considering.

 

We also need more data on SSD longevity in high write-rate workloads.  In the absence of historical data, ask your vendor to stand behind their product with full warrantee in your usage model before jumping in.

 

Speaking of wear-out rates, for the next posting I’ll investigate client-side MLC NAND-flash wear out rates.

 

                                                                --jrh

 

James Hamilton, Data Center Futures
Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | blog:http://perspectives.mvdirona.com

 

Wednesday, October 15, 2008 6:27:15 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Hardware
 Saturday, October 11, 2008

Albert Greenberg and I missed Hotnets 2008 last week due to a conflicting meeting down in California but Ken Church was there to present our On Delivering Embarrassingly Distributed Cloud Services paper.  I summarized the paper in a recent blog entry:  Embarrassingly Distributed Cloud Services and the abstract from the paper follows:

 

 Very large data centers are very expensive (servers, power/cooling, networking, physical plant.) Newer, geo-diverse, distributed or containerized designs offer a more economical alternative. We argue that a significant portion of cloud services are embarrassingly distributed – meaning there are high performance realizations that do not require massive internal communication among large server pools. We argue further that these embarrassingly distributed applications are a good match for realization in small distributed data center designs. We consider email delivery as an illustrative example. Geo-diversity in the design not only im-proves costs, scale and reliability, but also realizes advantages stemming from edge processing; in applications such as spam filtering, unwanted traffic can be blocked near the source to reduce transport costs.

 

The Hotnets agenda and all the papers present5ed are up at: Seventh ACM Workshop on Hot Topics in Networks.

 

The slides ken presented are posted at:

·         pptx form:  http://conferences.sigcomm.org/hotnets/2008/slides/EmbarrassinglyDistributed6.pptx

·         pdf form:  EmbarrassinglyDistributedFinalSlides.pdf (724.17 KB)

 

James Hamilton, Data Center Futures
Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | blog:http://perspectives.mvdirona.com

Saturday, October 11, 2008 1:40:21 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Saturday, October 04, 2008

Google has long enjoyed a reputation for running efficient data centers. I suspect this reputation is largely deserved but, since it has been completely shrouded in secrecy, that’s largely been a guess built upon respect for the folks working on the infrastructure team rather than anything that’s been published.  However, some of the shroud of secrecy was lifted last week and a few interesting tidbits were released in Google Commitment to Sustainable Computing.

 

On server design (Efficient Servers), the paper documents the use of high-efficiency power supplies and voltage regulators, and the removal of components not relevant in a service-targeted server design.  A key point is the use of efficient, variable-speed fans.  I’ve seen servers that spend as much as 60W driving the fans alone. Using high efficiency fans running at the minimum speed necessary based upon current heat load can bring big savings.  An even better approach is employed by Rackable Systems in their ICE Cube Modular Data Center design (First Containerized Data Center Announcement) where they eliminate server fans entirely.

 

The paper also argues for energy proportionality a concept introduced by Luiz Barroso and Urs Holzle of Google. Energy proportionality is a call to the industry to produce servers where the amount of energy consumed is proportional to the server load.  Sadly, many current server designs consume more than 60% of their full load power when idle.  None of us will talk publically about the average utilizations of our servers farms but the quick summary is that achieving very high utilizations is incredibly difficult. Or, worded differently, most servers are on average closer to idle than to full load. Even small steps towards energy proportionality make a huge difference and, of course, getting utilization up remains the holy grail of the industry.

 

It’s good to see water conservation brought up beside energy efficiency.  It’s the next big problem for our industry and the consumption rates are prodigious.  To achieve efficiency, most centers have cooling towers which allow them to avoid the use of energy-intensive direct-expansion chillers except under unusually hot and humid conditions. This is great news from an energy efficiency perspective, but cooling towers consume water in two significant ways. The first are evaporative losses which are hard to avoid in wet tower designs (other less water-intensive designs exist). The second is caused by the first. As water evaporates from the closed system, the concentrations of dissolved solids and other contaminants present in the supply water left behind by evaporation continue to rise. These high concentrations are dumped from the system to protect it and this dumping is referred to as blow-down water.  Between make-up and blow-down water, a medium-sized, 10MW facility, built to current industry conventions, can go through ¼ to ½ million gallons of water a day.

 

The paper describes a plan to address this problem in the future by moving to recycled water sources.  This is good to see but I argue the industry needs to reduce overall water consumption, whether the source is fresh or recycled.  The combination of higher data center temperatures and aggressive use of air-side economization are both good steps in that direction and industry-wide we’re all working hard on new techniques and approaches to reduce water consumption.

 

The section on PUE is the most interesting in that the are documenting an at-scale facility running at a PUE of 1.13 during a quarter. Generally, you want full-year numbers since these numbers are very load and weather dependent. The best annual number quoted in the paper is 1.15 which is excellent. That means that for every watt delivered to servers 0.15W is lost in power distribution and cooling. 

 

This number, with pure air-side cooling and good overall center design, is quite attainable.  But, elsewhere in the document, they described the use of cooling towers.  Attaining a PUE of 1.15 with a conventional water-based cooling system is considerably more difficult.  On the power distribution side, conventional designs waste about 8% to 9% of the power delivered.  A rough breakdown of where it goes is 3 transformers taking  115KV down to 13.2KV down to 480KV and then down to 208KV for delivery to the load. Good transformer designs run around 99.7% efficiency.  The uninterruptable power supply can be as poor as 94%, and roughly 1% is lost in switching and conductors. That approach gets us to 8% lost in distribution. We can easily eliminate one layer of transformers and either use a high efficiency bypass UPS.   Let’s use 97% efficiency for the UPS. Those two changes will get us 4% to 5% lost in distribution.  Let’s assume we can reliably hit 5% power distribution losses.  That leaves us with 10% for all the losses to the mechanical systems.  Powering the Computer Room Air Handlers, the water pumps etc. at only 10% overhead would be both difficult and more impressive. 

 

The 1.15 PUE with pure air-side economization in the right climate looks quite reasonable, but powering a conventional, high-scale, air and water, multi-conversion cooling system at this efficiency looks considerably harder to me.  Unfortunately, there is no data published in the paper on the approach and whether it was simply attained by relying on favorable weather conditions and air-side economization with the water loops idle.

 

The paper closes with An Efficient and Clean Energy Future, a discussion of the