When You Can’t Afford Not to Have Power Redundancy

Atlanta Hartsfield-Jackson International Airport suffered a massive power failure yesterday where the entire facility except for emergency lighting and safety equipment was down for nearly 11 hours. The popular press coverage on this power failure is extensive but here are two examples:

For most years since 1998, Atlanta International Airport has been the world’s busiest airport. In 2012, 100 million passengers flew and 950,119 flights originated or terminated at the airport. That’s roughly 260,000 passengers per day. When Atlanta is down, more than a ¼ million passengers are stranded or less effective. Ignoring that, the entire $1.28 billion dollar facility doesn’t produce revenue. Every major passenger jet trapped by the outage and unable to fly, is worth more than $100m with an average cost per plane of $103.4 million for a 737, $387.2 million for a 747, $202.6 million for a 767, $344.2 million for a 777, and $270.9 million for a 787. The cost of a day’s financing on the aggregate number of planes unable to fly and the $1.28 billion dollar facility that is making its customers less productive rather than more is a big number and would fund a fairly substantial investment in power redundancy. Let’s look at what happened and what it would cost to avoid.

What happened?

It still very early so we don’t yet know the root cause of the power failure with precision but it’s also a sufficiently large failure that it’s easy to know the area of impact and the nature of the fault. Georgia power reports “switchgear … in an underground electrical facility, could have failed and started a fire.” The cause of the switchgear failure is yet known with certainty and it’s even possible that the fault was not directly caused by the switchgear. However, it’s inarguable that the facility housing the switchgear that feeds the airport experienced a fire which led to a nearly 11-hour outage. It’s highly unlikely that there is anything other than switchgear in that room that could have failed, but it’s also impossible to rule out an external fault source whether it be natural or malicious. What’s important is that the facility didn’t have redundant power available through alternative delivery paths. This was a big event impacting more than a quarter million travelers so let’s have a look at the cost of power switchgear redundancy for Atlanta International Airport.

What could have been done?

What would it cost to protect the facility against a single failure leading to an outage of this magnitude? Hartsfield-Jackson is a big airport. In fact it’s currently the busiest in the world but it’s still not that big a power consumer when compared to large data centers and industrial facilities that have decades of experience in providing cost effective backup power. In the Hartsfield-Jackson Atlanta International Airport Sustainable Management Plan, they report the Central Passenger Terminal Complex consumed 240,220 Megawatt hours in 2010. Non CPTC facilities (the remainder of the airport) consumed 47,671MWhrs for a total of 287,891 MWhrs. Power consumption will be climbing for a variety of reasons in the facility even while the operators work to reduce consumption through increased efficiency but, for the purpose working power redundancy, we can assume some of the big power consumers like the Sky Train aren’t operated during a power failure. By not operating less important electrical assets, we can keep the required power levels within this historical envelope.

If the power consumption for the entire facility was steady all year, this would represent an average draw of 32.9 MW. Of course, power consumption will vary over time as conditions change. As lights go on and off and as temperatures go up and down. The lights likely stay pretty constant but the mechanical loads for cooling will vary greatly. There will be peaks above this average consumption level. If we assume a peak to average level of 1.5x, the peak draw we would need to provide during an outage would be 49.3 MW. This is a lot of power but many of the world’s datacenters are far larger and most of the serious datacenter operators will build at least 25 to 30 MW facilities. If we worked hard to shed non-critical loads not needed to fly airplanes, we could clearly operate with far lower peak power levels. Clearly the airport can operate without the electric trains and at a much higher air conditioning set point comfortable but the planes would still fly. Far less than 50MW peak power looks perfectly reasonable during utility power outages but let’s see what it would take to provide this fairly complete level of backup.

Focusing on data center power redundancy, there are four major components in the most common designs: First, the entire facility has a utility feed with sufficient capacity to be able to deliver the largest design peak power. Second, there is also enough emergency power generation to able to hold this same peak load during utility outages. Third, there is switch gear between the utility feed and the generators to both control the generators and to ensure that the load is only connected to one of those two sources at a time. These three components are sufficient to allow the datacenter to operate on either utility power or generator power and to switch between them. But, these switching events can’t be cost effectively made fast enough to keep servers running. Servers won’t reliably run through power outages of longer than 16 msec (0.016 seconds) and many modern power supplies won’t reliably hold the load beyond a 12 msec outage. Since these switching events can’t be reliably and cost effectively done without a server outage, the fourth common component of datacenter power redundancy needs to be added: the Uninterruptable Power Supply. A UPS is essentially a large set of batteries or some other power source such as a spinning flywheel that will keep the servers running when the power source is outside of normal operating bounds.

During power failures the UPS holds the critical load while the switchgear waits for a very short period to see if the utility power will return quickly. Most utility outages are sub-second or only single digit seconds so it makes no sense to start the generator since it’ll likely no longer be needed by the time it has finished its startup sequence. If the utility has been out for some small number of seconds, the UPS continues to hold the server load while the generators are started. Once the generators are running and reach stable output voltage and will be able to reliably hold the load, the switch gear transfers the load to the generators and the UPS is no longer needed. The UPS is only really needed to handle the short transition between utility failure and the backup generators being able to hold the load. Since UPS capacity is fairly expensive and only really needed for the servers and networking equipment that can’t operate through short duration outages, it’s usually only purchased to protect these critical components. Mechanical (cooling) systems and all but emergency lighting can go down for short periods so don’t need the extra expense of UPS protection. The mechanical system themselves take some time to recover from a power outage but there is enough engineering headroom in most datacenter cooling designs that this power loss recovery time isn’t a problem either. Consequently, non-critical datacenter loads are usually not UPS.

For an airport terminal facility outside of a few mission critical components like air traffic control that are already well protected, a few second power outage isn’t great but it’s not really that disruptive if the power is back in under a minute. This was news because it was a nearly 11 hour outage. Consequently, an airport power redundancy system doesn’t need UPS protection. All that is really needed to avoid long outages is sufficient generator capacity to hold the load and the switch gear to transfer the load onto the generators. Assuming a peak to average draw of less than 1.5x, we know adding redundancy will require no more than 50MW of capacity.

Many data centers use larger generators but 2.5MW is a common choice and they are getting as close to commodity pricing as you will find in the critical power market. A 2.5MW generator and the associated switchgear will come in just below $1M for buyers without big volume purchasing power. That’s under $400k per MW of backup power or $20m for the entire generator and switch gear package to provide backup power for the full airport. Clearly this number could be reduced by not powering restaurants, shops, and other facilities not directly related to moving passengers and cargo efficiently but, to keep things simple, let’s accept this number as it is. Although we don’t need to have instant failover so we don’t need UPS protection, we do need the backup power to be reliable, so it would be wise to be N+1 on the generators. Since transfer speed isn’t that important we can add one generator to backup all 20 generators and just have simple manual transfer switching to allow that single generator to be brought on line to replace any single generator that didn’t start or failed during a utility power outage. This makes our backup power system N+1 and allows us to continue to operate during a utility failure even when one generator fails or happened to be getting service at the time of the outage and the full package is $21M installed.

That’s only 1.6% of the cost of the full airport. This is an interestingly small number, yet still big enough that it would be hard to accept to protect against something that might only happen every couple of decades. The airport cost however, although massive at $1.28B, is far smaller than the other assets that couldn’t be fully monetized as a consequence of this outage. Knowing this airport has 2,600 flights per day and that Delta alone canceled 300 flights yesterday, it’s reasonable to assume that at least 800 planes didn’t make money as a consequence of this outage. There are far more planes negatively impacted by the outage but many of these planes will make some portion of their full day so we’ll assume that the aggregate loss is equivalent to 800 idle planes for a day.

From our earlier discussion on the cost of commercial passenger jets where the common Boeing examples we looked at ranged between $103M for a 737 and $387M for a 747, we’ll assume the average price of an idled plane is $120M since the smaller plans will dominate. If we take 800 planes at $120M and a cost of capital of 6% annually, we get $16.4M per day. That’s getting close to enough to fund the upgrade to full power redundancy funded only by a single day outage but it’s still not quite there unless major outages are more frequent than once per lifetime of the redundant power equipment. If the major outage rate is as high as twice during the life of the power redundancy gear, the project will pencil out very nicely. The ripple effects of the outage go far beyond the loss of the use of the aircraft and the facility. There are reputational losses to companies like Delta that make Atlanta their primary hub and, if the facility isn’t reliable, business will go elsewhere. Arguably one of the reasons that Atlanta has eclipsed Chicago in airport traffic is Atlanta doesn’t have as many of the serious weather delays that can slow traffic through Chicago. Customers like reliability but these less direct but still relevant impacts are harder to measure and they usually are not driven by a single negative event.

The argument I’m making is Atlanta International Airport could have power redundancy for only $21M. Using reasonable assumptions, the losses from failure to monetize 800 aircraft for 1 day would pay 78% of the redundancy cost. If there was a second outage during the life of the power redundancy equipment, the costs are much more than covered. If we are willing to consider non-direct outage losses, a single failure would be enough to easily justify the cost of the power redundancy. Perhaps the biggest stumbling block to adding the needed redundancy is the flight operators lost most of the money but it would be the airport operator that bears the expense of adding the power redundancy.

Let’s ignore the argument above, that a single event might fully pay for the cost of power redundancy protection, and consider the harder-to-fully-account for broader impact. It seems sensible for the operators of the biggest airport in the world and the airlines that fly through that facility to collectively pay $21M for 10 years of protection and have power redundancy. Considering this from a regulatory perspective and looking at the value of keeping the largest of the nation’s airports operating, a good argument can be made that it shouldn’t be possible for a single power event to take out such a facility and it should be a requirement to have reasonable redundancy through all the infrastructure of any airport of medium or larger size.

Ramblings

6 comments on “When You Can’t Afford Not to Have Power Redundancy”

Chasm says:

June 3, 2018 at 4:22 pm

Hamburg Airport currently has a similar power outage. The airport has been closed for the day.
Details are sparse, reported cause is an electrical short. At first Terminal 2 was affected, flights were transferred to Terminal 1. Then the outage spread and the airport whole airport closed.

One interesting detail has been reported as official reason for some of the bad information distribution within the airport: The outage affected the Public Address system.
The lack of power for the fire detection system was later given as one of the reasons to stay closed.

Why would you want an actually working redundant feeds and UPS on safety critical system? Preposterous idea!
Both PA and fire detection do have stringent rules. They are supposed to work even if the rest does not…

Reply
James Hamilton says:

December 28, 2017 at 6:53 am

Earlier today A large power failure hit Disneyland in California mid-morning. The early report form Disney is a “transformer issue” was at fault. Popular press reports:
• http://edition.cnn.com/2017/12/27/us/california-disneyland-outage/index.html
• http://abc7.com/travel/power-restored-at-disneyland-after-outage-disrupts-attractions/2830770/
• https://www.ctvnews.ca/world/power-outage-at-disneyland-forces-guests-to-be-escorted-from-stalled-rides-1.3737277

Power was restored to Toontown and much of Fantasy Land within a couple of hours and all facility power was restored by 4pm. The ever popular “It’s a Small World” was down for an extended period.

OK, perhaps there are some times when you really don’t need power redundancy :-)

Reply
Ryan Kiskis says:

December 21, 2017 at 4:15 am

I had read that the airport actually did have redundant power supplies – but the fire damaged the delivery tunnel through which both were routed. So arguably the cost could be much less if it would have primarily been not routing both systems through the same pathways. Though interestingly there was another article about DFW investing to prevent similar outages; their quoted number was $40M which, assuming they have included construction, install and other incidental costs, is a good ballpark of your estimates.

Reply
- James Hamilton says:
  
  December 21, 2017 at 4:41 am
  
  At that point it’s just additional switchgear and installation costs. Either way, it’s not expensive to have redundant power and switchgear.
  
  Reply
Paul Robichaux says:

December 19, 2017 at 5:51 pm

I think the case for adding power protection is both worse and better than what you predict.

Because aircraft financing is Byzantine, I’m not sure that figuring the capital cost of the aircraft is the best measure– DL, like most other airlines, is pretty highly leveraged in its fleet, but it also has a significant number of fully-owned airplanes that have long been paid for (all of their MD-8x airframes, for example).

Then there are the second-order losses: the airport concessionaires lost some amount of money; Delta had to pay extra for diverted flights (e.g. DL 1251 from LAX landed at Huntsville, and that was an additional cost).

Ultimately, though, I think you could make a really powerful argument to DL and the Atlanta airport authority: “would you pay $21 million to avoid being the lead story on CNN/NBC/CBS/ABC/Fox for the next 10 years?” That seems likely to be a slam dunk.

Reply
- James Hamilton says:
  
  December 19, 2017 at 6:23 pm
  
  I agree the reputational damage makes $21 for 10 years of insurance look pretty cheap.
  
  Reply

Perspectives

When You Can’t Afford Not to Have Power Redundancy

What happened?

What could have been done?

6 comments on “When You Can’t Afford Not to Have Power Redundancy”

Leave a Reply Cancel reply