James Hamilton's Blog RSS 2.0
 Wednesday, July 16, 2008

I’ve been collecting scaling stories for some time now and last week I came across the following run down on Fliker scaling: Federation at Flickr: Doing Billions of Queries Per Day by Dathan Vance Pattishall, the Flickr database guy.

 

The Flickr DB Architecture is sharded with a PHP access layer to maintain consistency.  Flickr users are randomly assigned to a shard. Each shard is duplicated in another database that is also serving active shards. Each DB needs to be less than 50% loaded to be able to handle failover.

 

Shards are found via a lookup ring that maps userID or groupID to shardID and photoID to userID.  The DBs are protected by a memcached layer with a 30 minute caching lifetime. Slide 16 says they are maintaining consistency using distributed transactions but I strongly suspect they are actually just running two parallel transactions with application management rather than 2pc.

 

Maintenance is done by bringing down ½ the DBs and the remaining DBs will handle the load but it appears they have no redundancy (failure protection) during the maintenance periods.

 

They have 12TB of user data in aggregate and they appear to be using MySQL (slide 25 complains about an INNODB bug).

 

Other web site scaling stories:

·         Scaling Linkedin: http://perspectives.mvdirona.com/2008/06/08/ScalingLinkedIn.aspx

·         Scaling Amazon: http://glinden.blogspot.com/2006/02/early-amazon-splitting-website.html

·         Scaling Second Life: http://radar.oreilly.com/archives/2006/04/web_20_and_databases_part_1_se.html

·         Scaling Technorati: http://www.royans.net/arch/2007/10/25/scaling-technorati-100-million-blogs-indexed-everyday/

·         Scaling Flickr: http://radar.oreilly.com/archives/2006/04/database_war_stories_3_flickr.html

·         Scaling Craigslist: http://radar.oreilly.com/archives/2006/04/database_war_stories_5_craigsl.html

·         Scaling Findory: http://radar.oreilly.com/archives/2006/05/database_war_stories_8_findory_1.html

·         MySpace 2006: http://sessions.visitmix.com/upperlayer.asp?event=&session=&id=1423&year=All&search=megasite&sortChoice=&stype=

·         MySpace 2007: http://sessions.visitmix.com/upperlayer.asp?event=&session=&id=1521&year=All&search=scale&sortChoice=&stype=

·         Twitter, Flickr, Live Journal, Six Apart, Bloglines, Last.fm, SlideShare, and eBay: http://poorbuthappy.com/ease/archives/2007/04/29/3616/the-top-10-presentation-on-scaling-websites-twitter-flickr-bloglines-vox-and-more

 

                        -jrh

 

Thanks to Kevin Merritt (Blist) and Dave Quick (Microsoft) for sending this my way.

 

James Hamilton, Data Center Futures
Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | blog:http://perspectives.mvdirona.com

 

Wednesday, July 16, 2008 5:13:40 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2] - Trackback
Services
Tracked by:
"Internet-scale Websites" (queuefull blog) [Trackback]
Friday, August 15, 2008 1:53:27 PM (Pacific Standard Time, UTC-08:00)
You realize, don't you, that this list was already posted in your blog in November: http://perspectives.mvdirona.com/2007/11/12/ScalingWebSites.aspx
David
Tuesday, August 19, 2008 3:28:10 PM (Pacific Standard Time, UTC-08:00)
Nothing wrong with bringing it to the fore again. I only got this because of Stumble.
Name
E-mail
Home page

Comment (Some html is allowed: a@href@title, strike) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

Enter the code shown (prevents robots):

Live Comment Preview
Categories
Archive
<July 2008>
SunMonTueWedThuFriSat
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789

This Blog
Member Login
All Content © 2008, James Hamilton
Theme created by Christoph De Baene / Modified 2007.10.28 by James Hamilton