Facebook Releases Cassandra as Open Source

Last week the Facebook Data team released Cassandra as open source. Cassandra is an structured store with write ahead logging and indexing. Jeff Hammerbacher, who leads the Facebook Data team described Cassandra as a BigTable data model running on a Dynamo-like infrastructure.

Google Code for Cassandra (Apache 2.0 License): http://code.google.com/p/the-cassandra-project/.

Avinash Lakshman, Prashant Malik, and Karthik Ranganathan presented at SIGMOD 2008 this year: Cassandra: Structured Storage System over a P2P Network. From the presentation:

Cassandra design goals:

· High availability

· Eventual consistency

· Incremental scalability

· Optimistic replication

· Knobs to “tune” tradeoffs between consistency, durability, and latecy

· Low cost of ownership

· Minimal administration

Write operation: write to arbitrary node in Cassandra cluster, request sent to node owning the data, node writes to log first and then applied to in-memory copy. Properties of write: no locks in critical path, sequential disk accesses, behaves like a write through cache, atomicity guarantee for a key, and always writable.

Cluster membership is maintained via gossip protocol.

Lessons learned:

· Add fancy features only when required

· Many types of failures are possible

· Big systems need proper systems-level monitoring

· Value simple designs

Future work:

· Atomicity guarantees across multiple keys

· Distributed transactions (I’ll try to talk them out of this one)

· Compression support

· Fine grained security via ACLs

It looks like a well engineered system.

–jrh

James Hamilton, Data Center Futures
Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh | blog:http://perspectives.mvdirona.com

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.