I’ve been collecting scaling stories for some time now and last week I came across the following run down on Fliker scaling: Federation at Flickr: Doing Billions of Queries Per Day by Dathan Vance Pattishall, the Flickr database guy.
The Flickr DB Architecture is sharded with a PHP access layer to maintain consistency. Flickr users are randomly assigned to a shard. Each shard is duplicated in another database that is also serving active shards. Each DB needs to be less than 50% loaded to be able to handle failover.
Shards are found via a lookup ring that maps userID or groupID to shardID and photoID to userID. The DBs are protected by a memcached layer with a 30 minute caching lifetime. Slide 16 says they are maintaining consistency using distributed transactions but I strongly suspect they are actually just running two parallel transactions with application management rather than 2pc.
Maintenance is done by bringing down ½ the DBs and the remaining DBs will handle the load but it appears they have no redundancy (failure protection) during the maintenance periods.
They have 12TB of user data in aggregate and they appear to be using MySQL (slide 25 complains about an INNODB bug).
Other web site scaling stories:
· Scaling Linkedin: http://perspectives.mvdirona.com/2008/06/08/ScalingLinkedIn.aspx
· Scaling Amazon: http://glinden.blogspot.com/2006/02/early-amazon-splitting-website.html
· Scaling Second Life: http://radar.oreilly.com/archives/2006/04/web_20_and_databases_part_1_se.html
· Scaling Craigslist: http://radar.oreilly.com/archives/2006/04/database_war_stories_5_craigsl.html
· Twitter, Flickr, Live Journal, Six Apart, Bloglines, Last.fm, SlideShare, and eBay: http://poorbuthappy.com/ease/archives/2007/04/29/3616/the-top-10-presentation-on-scaling-websites-twitter-flickr-bloglines-vox-and-more
James Hamilton, Data Center Futures
Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com