Last week, I posted Scaling Second Life. Royans sent me a great set of scaling stories: Scaling Web Architectures and Vijay Rao of AMD pointed out How FarmVille Scales to Harvest 75 Million Players a Month. I find the Farmville example particularly interesting in that it’s “only” a casual game. Having spent most of my life (under a rock) working on high-scale servers and services, I naively would never have guessed that casual gaming was big business. But it is. Really big business. To put a scale point on what “big” means in this context, Zynga, the company responsible for Farmville, is estimated to have a valuation of between $1.5B and $3B (Zynga Raising $180M on Astounding Valuation) with annual revenues of roughly $250M (Zynga Revenues Closer to $250).
The Zynga games portfolio includes 24 games, the best known of which are Mafia Wars and FarmVille. The Farmville scaling story is an great example of how fast internet properties can need to scale. The game had 1M players after 4 days and 10M after 60 days.
In this interview with FarmVille’s Luke Rajich (How FarmVille Scales to Harvest 75 Million Players a Month), Luke talks about scaling what he refers to as both the largest game in the world and the largest application on a web platform. FarmVille is a Facebook application and peak bandwidth between FarmVille and Facebook can run as high as 3Gbps. The FarmVille team has to manage both incredibly fast growth and very spikey traffic patterns. They have implemented what I call graceful degradation mode(Designing and Deploying Internet-Scale Services) and are able to shed load as load as gaming traffic increases push them towards their resource limits. In Luke’s words “the application has the ability to dynamically turn off any calls back to the platform. We have a dial that we can tweak that turns off incrementally more calls back to the platform. We have additionally worked to make all calls back to the platform avoid blocking the loading of the application itself. The idea here is that, if all else fails, players can continue to at least play the game. […]The way in which services degrade are to rate limit errors to that service and to implement service usage throttles. The key ideas are to isolate troubled and highly latent services from causing latency and performance issues elsewhere through use of error and timeout throttling, and if needed, disable functionality in the application using on/off switches and functionality based throttles.” These are good techniques that can be applied to all services.
Lessons Learned from scaling Farmville:
1. Interactive games are write-heavy. Typical web apps read more than they write so many common architectures may not be sufficient. Read heavy apps can often get by with a caching layer in front of a single database. Write heavy apps will need to partition so writes are spread out and/or use an in-memory architecture.
2. Design every component as a degradable service. Isolate components so increased latencies in one area won’t ruin another. Throttle usage to help alleviate problems. Turn off features when necessary.
3. Cache Facebook data. When you are deeply dependent on an external component consider caching that component’s data to improve latency.
4. Plan ahead for new release related usage spikes.
5. Sample. When analyzing large streams of data, looking for problems for example, not every piece of data needs to be processed. Sampling data can yield the same results for much less work.
Check out the High Scalability article How FarmVille Scales to Harvest 75 Million Players a Month for more details.