Monday, February 15, 2010

MySpace makes the odd technology choice that I don’t fully understand.  And, from a distance, there are times when I think I see opportunity to drop costs substantially. But, let’s ignore that, and tip our hat to the MySpace for incredibly scale they are driving. It’s a great social networking site and you just can’t argue with the scale they are driving. Their traffic is monstrous and, consequently, it’s a very interesting site to understand in more detail.

 

Lubor Kollar of SQL Server just sent me this super interesting overview of the MySpace service. My notes follow and the original article is at: http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?casestudyid=4000004532.

 

I particularly like social networking sites like Facebook and MySpace because they are so difficult to implement.  Unlike highly partitionable workloads like email, social networking sites work hard to find as many relationships, across as many  dimensions, amongst as many users as possible. I refer to this as the hairball problem. There are no nice clean data partitions which makes social networking sites amongst the most interesting of the high scale internet properties.  More articles on the hairball problem:

·         FriendFeed use of MySQL

·         Geo-Replication at Facebook

·         Scaling LinkedIn

 

The combination of the hairball problem and extreme scale makes the largest social networking sites like MySpace some of the toughest on the planet to scale.  Focusing on MySpace scale, it is prodigious:

·         130M unique monthly users

·         40% of the US population has MySpace accounts

·         300k new users each day

 

The MySpace Infrastructure:

·         3,000 Web Servers

·         800 cache servers

·         440 SQL Servers

 

Looking at the database tier in more detail:

·         440 SQL Server Systems hosting over 1,000 databases

·         Each running on an HP ProLiant DL585

o   4 dual core AMD procs

o   64 GB RAM

·         Storage tier: 1,100 disks on a distributed SAN (really!)

·         1PB of SQL Server hosted data

 

As ex-member of the SQL Server development team and perhaps less than completely unbiased, I’ve got to say that 440 database servers across a single cluster is a thing of beauty.

 

More scaling stores: http://perspectives.mvdirona.com/2010/02/07/ScalingSecondLife.aspx.

 

Hats off to MySpace for delivering a reliable service, in high demand, with high availability. Very impressive.

 

                                                                --jrh

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

 

Monday, February 15, 2010 12:11:19 PM (Pacific Standard Time, UTC-08:00)  #    Comments [19] - Trackback
Services
Tracked by:
"cheap ssd hard drives" (cheap ssd hard drives) [Trackback]
Monday, February 15, 2010 1:13:45 PM (Pacific Standard Time, UTC-08:00)
I wonder how close the Microsoft article is to reality. Is it really as simple as it sounds? Or are there many custom layers that we aren't seeing.

-Dave
Dave T
Monday, February 15, 2010 1:59:11 PM (Pacific Standard Time, UTC-08:00)
Woops, if it sounded simple, I was sloppy in my description. You are 100% correct Dave that any site running at any scale even hinting at the scale of MySpace is running with a huge amount of custom code.

That's one of the reasons why I'm personally loath to run anything other than home grown or open source software at scale. I like to know the software stack well and be able to change it at will. There are some exceptions to be sure but that's my starting point in most discussions with which I'm involved.

jrh@mvdirona.com
Monday, February 15, 2010 2:34:37 PM (Pacific Standard Time, UTC-08:00)
I guess I was talking more about the Microsoft article:

"Service Broker has enabled MySpace to perform foreign key management across its 440 database servers, activating and deactivating accounts for its millions of users, with one-touch asynchronous efficiency"

I suppose I need to read more about the Service Broker to really understand what it is doing.

"That's one of the reasons why I'm personally loath to run anything other than home grown or open source software at scale. I like to know the software stack well and be able to change it at will."

I like that idea, but it scares many people as well.

-Dave
Dave T
Monday, February 15, 2010 3:18:47 PM (Pacific Standard Time, UTC-08:00)
James, can you describe which of the technology choices seemed questionable to you?
Frank Ch. Eigler
Monday, February 15, 2010 4:51:46 PM (Pacific Standard Time, UTC-08:00)
Its questionable because going with Microsoft vs Open infrastructure was a business decision rather than a technical one.
myspace
Monday, February 15, 2010 9:21:07 PM (Pacific Standard Time, UTC-08:00)
The article mentions that the web and cache tiers run on Windows 2003. We're actually running Windows 2008R2 on the web tier with a few exceptions and the cache tier is running Windows 2008.

Of course it's not as simple as the architecture graph in the article would have you believe. All the major parts are represented, though large parts of the site may not follow the same path for the data. One thing is for sure, the DB and Cache teams have really done some outstanding work scaling MySpace.

Monday, February 15, 2010 9:33:03 PM (Pacific Standard Time, UTC-08:00)
Also, James, that PR from FusionIO a few months ago didn't make it clear that these cards are only on the caching tier. This is where we have monstrous random IO requirements on key/val storage and their associated indexes. They play up the power savings, but the real win is the jaw dropping IOs those cards can do for the money. The power savings is just another by-product the server consolidation that's possible.

We certainly have our fair share of odd choices over here, so not sure if this is what you were referring to.
Tuesday, February 16, 2010 4:16:19 AM (Pacific Standard Time, UTC-08:00)
Dave asked: I guess I was talking more about the Microsoft article: "Service Broker has enabled MySpace to perform foreign key management across its 440 database servers,activating and deactivating accounts for its millions of users, with one-touch asynchronous efficiency"

I've not used the service broker but it is work done by Jeff East and Pat Helland both phenomenally good engineers so, yes, it actually could scale as well as described in the article.

Dave also commented that "doing most of the code internally or on open source" sounded like a good idea but scared many people. That is true and it is probably appropriate. The approach only makes sense at high scale. On smaller scales, its harder to do cost effectively.

Thanks for the comment Dave.

jrh@mvdirona.com
Tuesday, February 16, 2010 4:21:18 AM (Pacific Standard Time, UTC-08:00)
Frank Eigler asked "James, can you describe which of the technology choices seemed questionable to you?"

I try my best not to second guess anyone running at MySpace scale and not offer advice unless they are asking for it. In this case, I got lucky and a couple of the issues I was referring to were addressed by MySpace folks in the comments that follows. Thanks!
Tuesday, February 16, 2010 4:32:19 AM (Pacific Standard Time, UTC-08:00)
Thanks to Chris Bell of MySpace and to myspace presumably from also MySpace for addressing the open source issue, the "replace all disks with SSDs" confusion, and providing a bit more background across the board. Thanks for the comments.

Providing context, for those that didn't read the SSD discussion, it's at: http://perspectives.mvdirona.com/2009/10/14/ReplacingALLDiskWithSSD.aspx

Chris from MySpace corrects the FusionIO press release that they didn't "replace all disk" which would be nuts. And, its not the power gains that make it a win. What they did at MySpace was use SSDs in the caching tier where they are driving prodiguous random I/Os. This is a wonderful use of SSDs and this is exactly where they offer great value. If you have a super hot, random I/O workload and the social networking hairball problem is certainly that, SSDS are a great choice.

By the way, are you sure you want to give vendors permission to do press releases without you folks first reviewing them to make sure the marketing excitement didn't get ahead of the facts?

Thanks for the comments and the additional data.

jrh@mvdirona.com
Wednesday, February 17, 2010 2:31:24 PM (Pacific Standard Time, UTC-08:00)
Thanks to Chris for clarifying the application of SSDs. I was fairly confident it was only the cache servers, not the cooler data. In any case, a pretty impressive reduction in HW. If all the numbers match (insert disclaimer) per the Fusion-io article (http://www.fusionio.com/load/media-docsArticle/kcb62o/MySpace-Goes-Green-Saves-Space-Power-and-Maintenance.pdf ) HW requirements (read cache servers) were reduced 60%. A quick estimate would be 20,000 15K rpm drives (800/0.4 * 10 SAS HDDs per server) and associated RAID controllers replaced with 800 Fusion-io ioDrives.

Are there any observations on the impact to the database tier of cutting cache tier HW by 60% ?
Wednesday, February 17, 2010 7:10:35 PM (Pacific Standard Time, UTC-08:00)
I don't follow why the DB tier would be reduced Mark. If the caching tier has the same hit rate, then the DB load won't change. I agree that the caching tier might be cheaper but I don't see the positive impact to the DB tier.

jrh@mvdirona.com
Thursday, February 18, 2010 1:43:38 PM (Pacific Standard Time, UTC-08:00)
James, I was just wondering if there was any impact on the database tier. I wasn't implying the database tier would be reduced. I agree that, on a first order, the DB tier workload would be unchanged. My thinking was along the lines of solid state storage latency reductions changing the database tier workload profile such as burstiness or peak levels. Significant latency changes or queue depth reductions typically move the performance issues to a new area.
Thursday, February 18, 2010 3:00:26 PM (Pacific Standard Time, UTC-08:00)
That's an interesting thought Mark. If the mid-tier is faster, then there would be less mid-tier servers so the cache miss traffic from each mid-tier to the DB tier would go up. The absolute miss count shouldn't change but the number of misses from each mid-tier would go up.

Here's another possibility. If customers were more engaged with MySpace and used it more because of the lower average latencies from the fast cache (quite possible: http://perspectives.mvdirona.com/2009/10/31/TheCostOfLatency.aspx), then the DB load for a given set of users might actually go up.

Lots of possibilities and one of the reasons why I believe in testing in production. Its often hard to predict the secondary and tertiary impacts of a change.

jrh@mvdirona.com
Thursday, February 18, 2010 3:51:54 PM (Pacific Standard Time, UTC-08:00)
Further thoughts on overall latency reductions and the 'more engaged' perspective. Reducing storage latencies by several orders of magnitude will result in lower client response times and, for e-commerce, higher conversion rates. As a second order effect solid state storage might also increase DB load due to increased conversion rates. But that is a good thing!
Wednesday, February 24, 2010 1:13:41 AM (Pacific Standard Time, UTC-08:00)
I would hope that all usage goes up because everything is so fast. We're continuing to work on that. :)

I haven't checked the details, but would expect the DB impact to be minimal or unverifiable. Keep in mind that we've (meaning Erik Nelson and others) have developed a very modular and fast cache server. In many cases we never touch a traditional relational database. We might be storing status update data, indexes or counters direct to disk in a custom binary format or any BDB format, with or without DB fallback. The cache relay components could write through to the DB atomically or asynchronously or to another cache server in a partner datacenter.

We have monitoring in place with dozens of custom performance counters streaming around in real-time. We can track service times, hit rates and other metrics at a very granular level because we own the code. We can try to make sure we're not waiting on I/O regardless of how the bits are stored. So, in other words, we can be reasonably sure we're not changing the flow too much.

Wednesday, February 24, 2010 5:57:00 AM (Pacific Standard Time, UTC-08:00)
Makes sense. Thanks for the additional detail Chris.

--jrh
Monday, March 15, 2010 12:39:34 PM (Pacific Standard Time, UTC-08:00)
How much does all this cost - using standard, non-discounted SQL Enterprise licensing (http://www.microsoft.com/sqlserver/2008/en/us/pricing.aspx) that's $11m ($25k * 440). I know OSS costs more in terms of support / resource costs, but you could hire a lot of people for $11m.
Monday, March 15, 2010 2:24:23 PM (Pacific Standard Time, UTC-08:00)
Given the competition is free of licensing costs, I'm guessing MySpace got a damn good discount from Microsoft on those SQL Server licenses. But, even then, I agree with you. Even on a 90% discount, there is room to hire a big team of engineers. At scale, even with very deep discounting, licensing costs get ugly.

--jrh
jrh@mvdirona.com
Comments are closed.

Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.

Archive
<February 2010>
SunMonTueWedThuFriSat
31123456
78910111213
14151617181920
21222324252627
28123456
78910111213

Categories
This Blog
Member Login
All Content © 2014, James Hamilton
Theme created by Christoph De Baene / Modified 2007.10.28 by James Hamilton