Storage Infrastructure Behind Facebook Messages

One of the talks that I particularly enjoyed yesterday at HPTS 2011 was Storage Infrastructure Behind Facebook Messages by Kannan Muthukkaruppan. In this talk, Kannan talked about the Facebook store for chats, email, SMS, & messages.

This high scale storage system is based upon HBase and Haystack. HBase is a non-relational, distributed database very similar to Google’s Big Table. Haystack is simple file system designed by Facebook for efficient photo storage and delivery. More on Haystack at: Facebook Needle in a Haystack.

In this Facebook Message store, Haystack is used to store attachments and large messages. HBase is used for message metadata, search indexes, and small messages (avoiding the second I/O to Haystack for small messages like most SMS).

Facebook Messages takes 6B+ messages a day. Summarizing HBase traffic:

· 75B+ R+W ops/day with 1.5M ops/sec at peak

· The average write operation inserts 16 records across multiple column families

· 2PB+ of cooked online data in HBase. Over 6PB including replication but not backups

· All data is LZO compressed

· Growing at 250TB/month

The Facebook Messages project timeline:

· 2009/12: Project started

· 2010/11: Initial rollout began

· 2011/07: Rollout completed with 1B+ accounts migrated to new store

· Production changes:

o 2 schema changes

o Upgraded to Hfile 2.0

They implemented a very nice approach to testing where, prior to release, they shadowed the production workload to the test servers.

After going into production the continued the practice of shadowing the real production workload into the test cluster to test before going into production: