Argument for Degraded Operations Mode

Yet another argument in favor of Degraded Operations Mode (http://mvdirona.com/jrh/perspectives/2008/01/22/DegradedOperationsMode.aspx) emerged last week. All of Amazon AWS (S3, SimpleDB, Simple Queuing Service, EC2, etc.) down for several hours last week: http://mvdirona.com/jrh/perspectives/2008/02/15/DowntimeAmazonS3SimpleDBSQS.aspx. The outage was reportedly due to a authentication storm: http://www.highscalability.com/s3-failed-because-authentication-overload (Mike Neil sent this my way).

Remember, you’ll never have the capacity for the biggest load inrush and, no matter how hard you try, your capacity planning will continue to only slightly better than the weather report for next week. When you don’t know what’s coming, design systems to operate through adversity: Degraded Operations Mode.

–jrh

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.