This morning I came across Exploring the software behind Facebook, the World’s Largest Site. The article doesn’t introduce new data not previously reported but it’s a good summary of the software used by Facebook and the current scale of the social networking site:
· 570 billion page views monthly
· 3 billion photo uploads monthly
· 1.2 million photos served per second
· 30k servers
The later metric, the 30k servers number is pretty old (Facebook has 30,000 servers). I would expect the number to be closer to 50k now based only upon external usage growth.
The article was vague on memcached usage saying only “Terrabytes”. I’m pretty interested in memcached and Facebook is, by far, the largest user, so I periodically check their growth rate. They now have 28 terabytes of memcached data behind 800 servers. See Scaling memcached at Facebook for more detail.
The mammoth memchached fleet at Facebook has had me wondering for years how close the cache is to the entire data store? If you factor out photos and other large objects, how big is the entire remaining user database? Today the design is memecached insulating the fleet of database servers. What is the aggregate memory size of the memcached and database fleet? Would it be cheaper to store the entire database 2-way redundant in memory with changes logged to support recovery in the event that a two server loss?
Facebook is very close if not already able to store the entire data store minus large objects in memory and within a factor of two of being able to store in memory twice and have memcached be the primary copy completely omitting the database tier. It would be a fun project.