Flickr DB Architecture

I’ve been collecting scaling stories for some time now and last week I came across the following run down on Fliker scaling: Federation at Flickr: Doing Billions of Queries Per Day by Dathan Vance Pattishall, the Flickr database guy.

The Flickr DB Architecture is sharded with a PHP access layer to maintain consistency. Flickr users are randomly assigned to a shard. Each shard is duplicated in another database that is also serving active shards. Each DB needs to be less than 50% loaded to be able to handle failover.

Shards are found via a lookup ring that maps userID or groupID to shardID and photoID to userID. The DBs are protected by a memcached layer with a 30 minute caching lifetime. Slide 16 says they are maintaining consistency using distributed transactions but I strongly suspect they are actually just running two parallel transactions with application management rather than 2pc.

Maintenance is done by bringing down ½ the DBs and the remaining DBs will handle the load but it appears they have no redundancy (failure protection) during the maintenance periods.

They have 12TB of user data in aggregate and they appear to be using MySQL (slide 25 complains about an INNODB bug).

Other web site scaling stories:

· Scaling Linkedin: http://perspectives.mvdirona.com/2008/06/08/ScalingLinkedIn.aspx

· Scaling Amazon: http://glinden.blogspot.com/2006/02/early-amazon-splitting-website.html

· Scaling Second Life: http://radar.oreilly.com/archives/2006/04/web_20_and_databases_part_1_se.html

· Scaling Technorati: http://www.royans.net/arch/2007/10/25/scaling-technorati-100-million-blogs-indexed-everyday/

· Scaling Flickr: http://radar.oreilly.com/archives/2006/04/database_war_stories_3_flickr.html

· Scaling Craigslist: http://radar.oreilly.com/archives/2006/04/database_war_stories_5_craigsl.html

· Scaling Findory: http://radar.oreilly.com/archives/2006/05/database_war_stories_8_findory_1.html

· MySpace 2006: http://sessions.visitmix.com/upperlayer.asp?event=&session=&id=1423&year=All&search=megasite&sortChoice=&stype=

· MySpace 2007: http://sessions.visitmix.com/upperlayer.asp?event=&session=&id=1521&year=All&search=scale&sortChoice=&stype=

· Twitter, Flickr, Live Journal, Six Apart, Bloglines, Last.fm, SlideShare, and eBay: http://poorbuthappy.com/ease/archives/2007/04/29/3616/the-top-10-presentation-on-scaling-websites-twitter-flickr-bloglines-vox-and-more

-jrh

Thanks to Kevin Merritt (Blist) and Dave Quick (Microsoft) for sending this my way.

James Hamilton, Data Center Futures
Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh | blog:http://perspectives.mvdirona.com

2 comments on “Flickr DB Architecture
  1. Ryan G says:

    Nothing wrong with bringing it to the fore again. I only got this because of Stumble.

  2. David says:

    You realize, don’t you, that this list was already posted in your blog in November: //perspectives.mvdirona.com/2007/11/12/ScalingWebSites.aspx

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.