Facebook Software Use

This morning I came across Exploring the software behind Facebook, the World’s Largest Site. The article doesn’t introduce new data not previously reported but it’s a good summary of the software used by Facebook and the current scale of the social networking site:

· 570 billion page views monthly

· 3 billion photo uploads monthly

· 1.2 million photos served per second

· 30k servers

The later metric, the 30k servers number is pretty old (Facebook has 30,000 servers). I would expect the number to be closer to 50k now based only upon external usage growth.

The article was vague on memcached usage saying only “Terrabytes”. I’m pretty interested in memcached and Facebook is, by far, the largest user, so I periodically check their growth rate. They now have 28 terabytes of memcached data behind 800 servers. See Scaling memcached at Facebook for more detail.

The mammoth memchached fleet at Facebook has had me wondering for years how close the cache is to the entire data store? If you factor out photos and other large objects, how big is the entire remaining user database? Today the design is memecached insulating the fleet of database servers. What is the aggregate memory size of the memcached and database fleet? Would it be cheaper to store the entire database 2-way redundant in memory with changes logged to support recovery in the event that a two server loss?

Facebook is very close if not already able to store the entire data store minus large objects in memory and within a factor of two of being able to store in memory twice and have memcached be the primary copy completely omitting the database tier. It would be a fun project.

–jrh

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

14 comments on “Facebook Software Use
  1. Jithendra Veeramachaneni says:

    Good discussion.

    Databases running with most (all) of data in memory may be a good replacement for caches in the future. Distributed caches do have few advantages: a). Partioning is built in based on key. b)They scale seamlessly to large amounts of data with simple adminstration. c). They evict the data based on usage patterns.
    I am not sure if current databases can do this.

    Databases do have advantages like queries, transactions but I wonder if they will take over the roles of caches in the near term.

  2. Eas says:

    For what it is worth, my understanding is that Facebook has an image caching layer that serves images out of memcached. If that is still the case, some of that RAM is occupied by image blobs.

  3. Good observation Greg. When I said "completely eliminate the database tier" you are right that I’m actually proposing completely removing the caching tier with a mostly (or fully) in memory database. I love caches but when the cache needs to get anywhere close to the size of entire data store, you know you are using the wrong database technology.

    –jrh
    jrh@mvdirona.com

  4. Great presentation. Thanks for the pointer Shaloo.

    –jrh
    jrh@mvdirona.com

  5. Greg Linden says:

    Rather than stating this as "completely eliminating the database tier", another way to look at this is that you are fixing the database tier while completely eliminating the caching layer.

    That is, if you use a very fast database where the working set or possibly even the entire data set is in memory, you can minimize or eliminate the caching layer. And that is what you are talking about doing, storing "the entire database 2-way redundant in memory with changes logged to support recovery in the event that a two server loss."

    Personally, I think memcached often is overdone. It seems to be a common pattern to just keep throwing machines at the caching layer to hide crappy database performance rather than questioning if the database really should be that slow.

  6. JD says:

    There is also the MIT/Brown/Yale H-Store project: http://hstore.cs.brown.edu

  7. Jason, thanks fro the comments. I do know of RAMCloud — John Ousterhout presented it at the Stanford Clean Slate CTO Summit (//perspectives.mvdirona.com/2009/10/24/StanfordCleanSlateCTOSummit.aspx) and an updated talk at HPTS 2009 (//perspectives.mvdirona.com/2009/11/14/RandyShoupJohnOusterhoutAtHPTS2009.aspx). Its an interesting project.

    I will post the updated model including networking costs. Thanks for nudging me. I’ve had the data for months but haven’t found the time to write it up. I will get on it.

    –jrh
    jrh@mvdirona.com

  8. Shaloo Shalini says:

    James, here is some more latest info (2010) on facebook wrt concurrent active users, data access pattern, kind of accesses, gets, and memcache use in case you are interested: http://qcontokyo.com/pdf/qcon_MarcKwiatkowski.pdf

  9. Jason Dai says:

    James,

    As all always, very insightful observations. Putting everything in memory reminds me of the interesting Standford project RAMCloud (http://fiz.stanford.edu:8081/display/ramcloud/Home).

    On a different note, I came across your presentation "Cloud Computing Economies of Scale" at MIX’10 (a great talk, by the way). In the presentation, you showed the data center cost model updated from //perspectives.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx, which now includes the cost of networking equipments. I wonder how that cost is modeled, and if it is possible for you to share the updated model so that others can play with their assumptions.

    Thanks,
    -Jason

  10. Bob, the observation I was making is that the 28 terabytes of cache coupled with all the memory in the MySQL behind them must be approaching the total non-photo storage (which are stored separately in a special purpose file system). To my knowledge, nobody is arguing to store the photos in memory for exactly the reason you state.

    –jrh
    jrh@mvdirona.com

  11. Bob Warfield says:

    James, does the math really work out to put it all in memory?

    Just the photos have to be pretty small at 3 billion a month incoming. The other thing that matters for these guys is most of the content is probably not viewed very often. When you first publish your photo, it gets a bunch of views. But how many photos a year old are still getting very many views?

    For them to put it all in memory and have 2 copies may be radically more expensive than paying for 2 x 28 terabytes.

    Cheers,

    BW

  12. You are right that there are many systems are heading to "put it all in memory". The tension is response time vs the cost of holding the VERY long tail in memory. If you have an object that gets referenced once every few years, its hard to justify putting it in memory and it being 15 msec away won’t hurt responsiveness even at the 99th percentile. The argument pushing the other way is simplicity of design of having everything in memory and the ability to deliver more uniform response times. Its tough to come up with a uniform rule that is correct across the board but it is clear that the number of workloads where 100% in memory is the right answer is going up all the time.

    –jrh
    jrh@mvdirona.com

  13. Randy Kern says:

    That would be a fun project! I think many large scale systems have/are heading in this direction already, with large, flat, in memory components for online processing, coupled to something like Hadoop for offline processing. If one really cares about performance, and I hope you do, this quickly becomes a necessary design to achieve good responsiveness into the tail (95th percentile and up). This approach also allows you to think about geo distribution in a very natural way, as a fundamental part of the architecture.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.