Flickr DB Architecture

I’ve been collecting scaling stories for some time now and last week I came across the following run down on Fliker scaling: Federation at Flickr: Doing Billions of Queries Per Day by Dathan Vance Pattishall, the Flickr database guy.

The Flickr DB Architecture is sharded with a PHP access layer to maintain consistency. Flickr users are randomly assigned to a shard. Each shard is duplicated in another database that is also serving active shards. Each DB needs to be less than 50% loaded to be able to handle failover.

Shards are found via a lookup ring that maps userID or groupID to shardID and photoID to userID. The DBs are protected by a memcached layer with a 30 minute caching lifetime. Slide 16 says they are maintaining consistency using distributed transactions but I strongly suspect they are actually just running two parallel transactions with application management rather than 2pc.

Maintenance is done by bringing down ½ the DBs and the remaining DBs will handle the load but it appears they have no redundancy (failure protection) during the maintenance periods.

They have 12TB of user data in aggregate and they appear to be using MySQL (slide 25 complains about an INNODB bug).

2 comments on “Flickr DB Architecture”

Ryan G says:

August 19, 2008 at 11:28 pm

Nothing wrong with bringing it to the fore again. I only got this because of Stumble.

Reply
David says:

August 15, 2008 at 9:53 pm

You realize, don’t you, that this list was already posted in your blog in November: //perspectives.mvdirona.com/2007/11/12/ScalingWebSites.aspx

Reply

2 comments on “Flickr DB Architecture”

Leave a Reply Cancel reply