Sunday, December 28, 2008

I’ve long argued that tough constraints often make for a better service and few services are more constrained than Wikipedia where the only source of revenue is user donations. I came across this talk by Domas Mituzas of Wikipedia while reading old posts on Data Center Knowledge.   The posting A Look Inside Wikipedia’s Infrastructure includes a summary of the talk Domas gave at Velocity last summer.  

 

Interesting points from the Data Center Knowledge posting and the longer document referenced below from the 2007 MySQL coference:

·  Wikipedia serves the world from roughly 300 servers

o  200 application servers

o  70 Squid servers

o  30 Memcached servers (2GB each)

o  20 MySQL servers using Innodb, each with 16GB of memory (200 to 300GB each)

o  They also use Squid, Nagios, dsh, nfs, Ganglia, Linux Virtual Service, Lucene over .net on Mono, PowerDNS, lighttpd, Apache, PHP, MediaWiki (originated at Wikipedia)

·  50,000 http requests per second

·  80,000 MySQL requests per second

·  7 million registered users

·  18 million objects in the English version

 

For the 2007 MySQL Users Conference, Domas posted great details on the Wikipidia architecture: Wikipedia: Site internals, configuration, code examples and management issues (30 pages).  I’ve posted other big service scaling and architecture talks at: http://perspectives.mvdirona.com/2008/12/27/MySpaceArchitectureAndNet.aspx.

 

James Hamilton
Amazon Web Services
jrh@mvdirona.com

 

Updated: Corrected formatting issue.

Sunday, December 28, 2008 7:04:05 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2] - Trackback
Services
Monday, December 29, 2008 10:07:52 AM (Pacific Standard Time, UTC-08:00)
In Domas' article, this point on hitting your caching layer versus hitting the database is worth emphasizing:

"The common mistake is to believe that database is too slow and everything in it has to be cached somewhere else. In scaled out environments reads are very efficient, and difference of time between efficient MySQL query and memcached request is negligible - both may execute in less than 1ms usually)."
Tuesday, December 30, 2008 1:48:04 PM (Pacific Standard Time, UTC-08:00)
I agree Greg. Many designs "over protect" the DB layer.

--jrh
jrh@mvdirona.com
Comments are closed.

Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.

Archive
<December 2008>
SunMonTueWedThuFriSat
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910

Categories
This Blog
Member Login
All Content © 2010, James Hamilton
Theme created by Christoph De Baene / Modified 2007.10.28 by James Hamilton