Febuary 28th, Cloud Camp Seattle was held at an Amazon facility in Seattle. Cloud Camp is described organizers as an unconference where early adapters of Cloud Computing technologies exchange ideas. With the rapid change occurring in the industry, we need a place we can meet to share our experiences, challenges and solutions. At CloudCamp, you are encouraged you to share your thoughts in several open discussions, as we strive for the advancement of Cloud Computing. End users, IT professionals and vendors are all encouraged to participate.
The Cloud Camp schedule is at: http://www.cloudcamp.com/.
Jeanine Johnson attended the event and took excellent notes. Jeanine’s notes follow.
It began with a series of “lightening presentations” – 5 minute presentations on cloud topics that are now online (http://www.seattle20.com/blog/Can-t-Make-it-to-Cloud-Camp-Watch-LIVE.aspx). Afterwards, there was a Q&A session with participants that volunteered to share their expertise. Then, 12 topics were chosen by popular vote to be discussed in an “open space” format, in which the volunteer who suggested the topic facilitated its 1 hour discussion.
Highlights from the lightening presentations:
· AWS has launched several large data sets (10-220GB) in the cloud and made them publically available (http://aws.amazon.com/publicdatasets/). Example data sets are the human genome and US census data; large data sets that would take hours, days, or even weeks to download locally with a fast Internet connection.
· A pyramid was drawn, with SaaS (e.g. Hotmail, SalesForce) on top, followed by PaaS (e.g. GoogleApp Engine, SalesForce API), IaaS (e.g. Amazon, Azure; which leverages virtualization), and “Traditional hosting” as the pyramid’s foundation, which was a nice and simple rendition of the cloud stack (http://en.wikipedia.org/wiki/Cloud_computing). In addition, SaaS applications were shown to have more functionality, and traveling down that pyramid stack resulted in less functionality, but more flexibility.
Other than that info, the lightening presentations were too brief with no opportunity for Q&A to learn much. After the lightening presentations, open space discussions were held. I attended three: 1) scaling web apps, 2) scaling MySql, and 3) launching MMOGs (massively multiplayer online games) in the cloud – notes for each session follow.
1. SCALING WEB APPS
One company volunteered themselves as a case study for the group of 20ish people. They run 30 physical servers, with 8 front-end Apache web servers on top of 1 scaled-up MySql database, and they use PHP channels to access their Drupal http://drupal.org content. Their MySql machine has 16 processors and 32GB RAM, but is maxed-out and they’re having trouble scaling it because they currently hover around 30k concurrent connections, and up to 8x that during peak usage. They’re also bottlenecked by their NFS server, and used basic Round Robin for load balancing.
Using CloudFront was suggested, instead of Drupal (where they currently store lots of images). Unfortunately, CloudFront takes up to 24 hours to notice content changes, which wouldn’t work for them. So the discussion began around how to scale Drupal, but quickly morphed into key-value-pair storage systems (e.g. SimpleDB http://aws.amazon.com/simpledb/) versus relational databases (e.g. MySql) to store backend data.
After some discussion around where business logic should reside, in StoredProcs and Triggers or in the code via an MVC http://en.wikipedia.org/wiki/Model-view-controller paradigm, the group agreed that “you have to know your data: Do you need real-time consistency? Or eventual consistency?”
Hadoop http://hadoop.apache.org/core/ was briefly discussed, but once someone said that popular web-development frameworks Rails http://rubyonrails.org/ and Django http://www.djangoproject.com/ steer folks towards relational databases, the discussion turned to scaling MySql. Best practice tips given to scale MySql were:
· When scaling-up, memory becomes a bottleneck, so use memcach http://www.danga.com/memcached/ to extend your system’s lifespan.
· Use MySql cluster http://www.mysql.com/products/database/cluster/.
· Use MySql proxy http://forge.mysql.com/wiki/MySQL_Proxy and shard your database, such that users are associated with a specific cluster (devs turn to sharding because horizontal scaling for WRITES isn’t as effective as it is for READS, aka replication processing becomes untenable).
Other open source technologies mentioned included:
· Galary2 http://www.gallery2.org/, an open source photo album.
· Jingle http://www.slideshare.net/stpeter/jingle, Jabber-based VoIP technology.
2. SCALING MYSQL
Someone volunteered from the group of 10ish people to white-board the “ways to scale MySql,” which were:
· Master / Slave, which can use Dolphin/Sakila http://forge.mysql.com/wiki/SakilaSampleDB, but becomes inefficient around 8+ machines.
· MySql proxy, and then replicate each machine behind the proxy.
· Master : Master topology using sync replication.
· Master ring topology using MySql proxy. It works well, and the replication overhead can be helped by adding more machine, but several thought it would be hard to implement this setup in the cloud.
· Mesh topology (if you have the right hardware). This is how a lot of high-performance systems work, but recovery and management are hard.
· Scale-up and run as few slaves as possible – some felt that this “simple” solution is what generally works best.
Someone then drew a “HA Druple Stack in the cloud,” which consisted of 3 front-end load balancers with hot-swap for failures to either the 2nd or 3rd machines, followed by 2 web-servers, 2 master/slave databases in the backend. If using Drupal, 2 additional NFS servers should be setup for static content storage with hot swap (aka fast Mac failover). However, it was recommended that Drupal be replaced with a CDN when the system begins to need scaling-up. This configuration in the Amazon cloud costs around $700 monthly to run (plus network traffic).
Memcach (http://memcachefs.sourceforge.net/) was mentioned as a possibility as well.
3. LAUNCHING MMOGs IN THE CLOUD
This topic was suggested by a game developer lead. He explained to the crowd of 10ish people that MMOs require persistent connections to servers, and their concurrent connections has a relatively high standard deviation daily, with a trend over the week that peaks around Saturday and Sunday. MMO producers must plan their capacity a couple months in advance of publishing their game. And since up to 50% of a MMO’s subscriber base is active on the first day, they usually end up with left-over capacity after launch, when active subscribers drop to 20% of their base and continue to dwindle down until the end of the game’s lifecycle. As a result, it’d be ideal to get MMOGs into the cloud, but no one in the room knew how to get around the latency induced by virtualization, which is too much for flashy MMOGs (although the “5%-ish” perf-hit is fine for asynchronous or low-graphics games). On a side note, iGames http://www.igames.org/ was mentioned as a good way to market games.
Afterwards, those people that were left went to the Elysian on 1st for drinks, and continued their cloud discussions.
James Hamilton, Amazon Web Services
1200, 12th Ave. S., Seattle, WA, 98144W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | james@amazon.com
H:mvdirona.com | W:mvdirona.com/jrh/work | blog:http://perspectives.mvdirona.com
Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.