Wednesday, April 07, 2010

Mike Stonebraker published an excellent blog posting yesterday at the CACM site: Errors in Database Systems, Eventual Consistency, and the CAP Theorem. In this article, Mike challenges the application of Eric Brewer’s CAP Theorem by the NoSQL database community. Many of the high-scale NoSQL system implementers have argued that the CAP theorem forces them to go with an eventual consistent model.

 

Mike challenges this assertion pointing that some common database errors are not avoided by eventual consistency and CAP really doesn’t apply in these cases. If you have an application error, administrative error, or database implementation bug that losses data, then it is simply gone unless you have an offline copy. This, by the way, is why I’m a big fan of deferred delete.  This is a technique where deleted items are marked as deleted but not garbage collected until some days or preferably weeks later.  Deferred delete is not full protection but it has saves my butt more than once and I’m a believer. See On Designing and Deploying Internet-Scale Services for more detail.

 

CAP and the application of eventual consistency doesn’t directly protect us against application or database implementation errors. And, in the case of a large scale disaster where the cluster is lost entirely, again, neither eventual consistency nor CAP offer a solution. Mike also notes that network partitions are fairly rare.  I could quibble a bit on this one. Network partitions should be rare but net gear continues to cause more issues than it should. Networking configuration errors, black holes, dropped packets, and brownouts, remain a popular discussion point in post mortems industry-wide. I see this improving over the next 5 years but we have a long way to go. In Networking: the Last Bastion of Mainframe Computing, I argue that net gear is still operating on the mainframe business model: large, vertically integrated and expensive equipment, deployed in pairs. When it comes to redundancy at scale, 2 is a poor choice.

 

Mike’s article questions whether eventual consistency is really the right answer for these workloads. I made some similar points in “I love eventual consistency but…” In that posting, I argued that many applications are much easier to implement with full consistency and full consistency can be practically implemented at high scale. In fact, Amazon SimpleDB recently announced support for full consistency. Apps needed full consistency are now easier to write and, where only eventual consistency is needed, its available as well.

 

Don’t throw full consistency out too early. For many applications, it is both affordable and helps reduce application implementation errors.

 

                                                                --jrh

 

Thanks to Deepak Singh for pointing me to this article.

 

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

 

Wednesday, April 07, 2010 11:58:15 AM (Pacific Standard Time, UTC-08:00)  #    Comments [5] - Trackback
Services
Tracked by:
"Création site internet bretagne" (Création site internet bretagne) [Trackback]
"Webmaster Book" (Webmaster Book) [Trackback]
Wednesday, April 07, 2010 1:49:35 PM (Pacific Standard Time, UTC-08:00)
Both modes are useful. Here's a question for you though James: what should be the *default* mode?

I would claim that eventual consistent semantics require more nuanced thinking on the part of the developer. We can be a bit sloppy in our thinking with full consistency because the behavior is more obvious. To me that suggests full consistent is default and EC is an option is ideal. But interested to here if you think the converse. :-)
Wednesday, April 07, 2010 3:30:37 PM (Pacific Standard Time, UTC-08:00)
How timely! I was just talking with someone at Rackspace about this.

Mike is right in that most *failures* in applications (i.e., bugs) come from the application itself, not the store behind it. Hoever, I think *degradation* comes from the back-end.

Also, I feel eventual consistency is more scary than imperfect availability. They're both failures to the end-user of course (Joe Williams from Cloudant talks about that).

In the event of an availability failure, it's easy to handle. You cache writes and block reads (that's a bit of an oversimplification). Any application designer knows how to do that at even a basic level: database request timeouts. It's a very well-handled failure scenario.

Consistency failures are more tricky, especially to those of us from an RDBMS mindset. "What do you mean, the data I read isn't my *real* data?" How in the world do you solve that? Granted, there's vector clocks and paxos and all that... but most application developers don't understand them. And should they really have to?
Thursday, April 08, 2010 4:15:51 AM (Pacific Standard Time, UTC-08:00)
I agree with you Dwight: 1) both eventual consistency and full consistency are useful, and 2) full consistency is probably the best choice for a default since full consistency semantics are least surprising of the two.

--jrh
Thursday, April 08, 2010 4:24:11 AM (Pacific Standard Time, UTC-08:00)
Good point Bradford. Data availability issues can be localized to a small number of users and there exist good techniques to mask these events. With care, failures can be detected quickly and a partition can be brought back online after failure rapidly whereas eventual consistency avoids these issues but at the expense of often surprising semantics. There is a place for both but eventual consistency brings semantics that many won't fully understand.

--jrh
Saturday, April 10, 2010 8:11:55 AM (Pacific Standard Time, UTC-08:00)
In the end I think it goes to different strokes for different folks.
Comments are closed.

Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.

Archive
<April 2010>
SunMonTueWedThuFriSat
28293031123
45678910
11121314151617
18192021222324
2526272829301
2345678

Categories
This Blog
Member Login
All Content © 2014, James Hamilton
Theme created by Christoph De Baene / Modified 2007.10.28 by James Hamilton