Sunday, June 20, 2010

This morning I came across Exploring the software behind Facebook, the World’s Largest Site. The article doesn’t introduce new data not previously reported but it’s a good summary of the software used by Facebook and the current scale of the social networking site:

·         570 billion page views monthly

·         3 billion photo uploads monthly

·         1.2 million photos served per second

·         30k servers

 

The later metric, the 30k servers number is pretty old (Facebook has 30,000 servers). I would expect the number to be closer to 50k now based only upon external usage growth.

 

The article was vague on memcached usage saying only “Terrabytes”. I’m pretty interested in memcached and Facebook is, by far, the largest user, so I periodically check their growth rate. They now have 28 terabytes of memcached data behind 800 servers. See Scaling memcached at Facebook for more detail.

 

The mammoth memchached fleet at Facebook has had me wondering for years how close the cache is to the entire data store?  If you factor out photos and other large objects, how big is the entire remaining user database? Today the design is memecached insulating the fleet of database servers. What is the aggregate memory size of the memcached and database fleet? Would it be cheaper to store the entire database 2-way redundant in memory with changes logged to support recovery in the event that a two server loss?

 

Facebook is very close if not already able to store the entire data store minus large objects in memory and within a factor of two of being able to store in memory twice and have memcached be the primary copy completely omitting the database tier. It would be a fun project.

 

                                                                --jrh

 

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

 

Sunday, June 20, 2010 6:56:35 AM (Pacific Standard Time, UTC-08:00)  #    Comments [15] - Trackback
Services
Sunday, June 20, 2010 7:13:38 AM (Pacific Standard Time, UTC-08:00)
That would be a fun project! I think many large scale systems have/are heading in this direction already, with large, flat, in memory components for online processing, coupled to something like Hadoop for offline processing. If one really cares about performance, and I hope you do, this quickly becomes a necessary design to achieve good responsiveness into the tail (95th percentile and up). This approach also allows you to think about geo distribution in a very natural way, as a fundamental part of the architecture.
Sunday, June 20, 2010 8:52:26 AM (Pacific Standard Time, UTC-08:00)
You are right that there are many systems are heading to "put it all in memory". The tension is response time vs the cost of holding the VERY long tail in memory. If you have an object that gets referenced once every few years, its hard to justify putting it in memory and it being 15 msec away won't hurt responsiveness even at the 99th percentile. The argument pushing the other way is simplicity of design of having everything in memory and the ability to deliver more uniform response times. Its tough to come up with a uniform rule that is correct across the board but it is clear that the number of workloads where 100% in memory is the right answer is going up all the time.

--jrh
jrh@mvdirona.com
Sunday, June 20, 2010 9:00:22 AM (Pacific Standard Time, UTC-08:00)
James, does the math really work out to put it all in memory?

Just the photos have to be pretty small at 3 billion a month incoming. The other thing that matters for these guys is most of the content is probably not viewed very often. When you first publish your photo, it gets a bunch of views. But how many photos a year old are still getting very many views?

For them to put it all in memory and have 2 copies may be radically more expensive than paying for 2 x 28 terabytes.

Cheers,

BW
Sunday, June 20, 2010 9:17:18 AM (Pacific Standard Time, UTC-08:00)
Bob, the observation I was making is that the 28 terabytes of cache coupled with all the memory in the MySQL behind them must be approaching the total non-photo storage (which are stored separately in a special purpose file system). To my knowledge, nobody is arguing to store the photos in memory for exactly the reason you state.

--jrh
jrh@mvdirona.com
Sunday, June 20, 2010 5:48:26 PM (Pacific Standard Time, UTC-08:00)
James,

As all always, very insightful observations. Putting everything in memory reminds me of the interesting Standford project RAMCloud (http://fiz.stanford.edu:8081/display/ramcloud/Home).

On a different note, I came across your presentation "Cloud Computing Economies of Scale" at MIX'10 (a great talk, by the way). In the presentation, you showed the data center cost model updated from http://perspectives.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx, which now includes the cost of networking equipments. I wonder how that cost is modeled, and if it is possible for you to share the updated model so that others can play with their assumptions.

Thanks,
-Jason
Jason Dai
Sunday, June 20, 2010 9:47:27 PM (Pacific Standard Time, UTC-08:00)
James, here is some more latest info (2010) on facebook wrt concurrent active users, data access pattern, kind of accesses, gets, and memcache use in case you are interested: http://qcontokyo.com/pdf/qcon_MarcKwiatkowski.pdf
Shaloo Shalini
Monday, June 21, 2010 5:59:02 AM (Pacific Standard Time, UTC-08:00)
Jason, thanks fro the comments. I do know of RAMCloud -- John Ousterhout presented it at the Stanford Clean Slate CTO Summit (http://perspectives.mvdirona.com/2009/10/24/StanfordCleanSlateCTOSummit.aspx) and an updated talk at HPTS 2009 (http://perspectives.mvdirona.com/2009/11/14/RandyShoupJohnOusterhoutAtHPTS2009.aspx). Its an interesting project.

I will post the updated model including networking costs. Thanks for nudging me. I've had the data for months but haven't found the time to write it up. I will get on it.

--jrh
jrh@mvdirona.com

Monday, June 21, 2010 1:07:29 PM (Pacific Standard Time, UTC-08:00)
There is also the MIT/Brown/Yale H-Store project: http://hstore.cs.brown.edu
JD
Monday, June 21, 2010 1:45:57 PM (Pacific Standard Time, UTC-08:00)
Rather than stating this as "completely eliminating the database tier", another way to look at this is that you are fixing the database tier while completely eliminating the caching layer.

That is, if you use a very fast database where the working set or possibly even the entire data set is in memory, you can minimize or eliminate the caching layer. And that is what you are talking about doing, storing "the entire database 2-way redundant in memory with changes logged to support recovery in the event that a two server loss."

Personally, I think memcached often is overdone. It seems to be a common pattern to just keep throwing machines at the caching layer to hide crappy database performance rather than questioning if the database really should be that slow.
Tuesday, June 22, 2010 8:54:43 PM (Pacific Standard Time, UTC-08:00)
Great presentation. Thanks for the pointer Shaloo.

--jrh
jrh@mvdirona.com
Tuesday, June 22, 2010 9:06:46 PM (Pacific Standard Time, UTC-08:00)
Good observation Greg. When I said "completely eliminate the database tier" you are right that I'm actually proposing completely removing the caching tier with a mostly (or fully) in memory database. I love caches but when the cache needs to get anywhere close to the size of entire data store, you know you are using the wrong database technology.

--jrh
jrh@mvdirona.com
Thursday, June 24, 2010 10:53:02 PM (Pacific Standard Time, UTC-08:00)
I think, the continuous increasing demand has required to make modifications to both the operating system and memcached to achieve the performance that provides the best possible experience for the users. I have read, they uses TCP / UDP to reclaim multiple gigabytes of memory per server.

http://www.cdmacellulars.com/
Friday, June 25, 2010 1:20:32 PM (Pacific Standard Time, UTC-08:00)
For what it is worth, my understanding is that Facebook has an image caching layer that serves images out of memcached. If that is still the case, some of that RAM is occupied by image blobs.
Sunday, July 04, 2010 7:18:37 AM (Pacific Standard Time, UTC-08:00)
strike
Wednesday, July 07, 2010 10:45:57 PM (Pacific Standard Time, UTC-08:00)
Good discussion.

Databases running with most (all) of data in memory may be a good replacement for caches in the future. Distributed caches do have few advantages: a). Partioning is built in based on key. b)They scale seamlessly to large amounts of data with simple adminstration. c). They evict the data based on usage patterns.
I am not sure if current databases can do this.

Databases do have advantages like queries, transactions but I wonder if they will take over the roles of caches in the near term.
Jithendra Veeramachaneni
Comments are closed.

Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.

Archive
<June 2010>
SunMonTueWedThuFriSat
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910

Categories
This Blog
Member Login
All Content © 2014, James Hamilton
Theme created by Christoph De Baene / Modified 2007.10.28 by James Hamilton