Scaling AWS Relational Database Service

Hosting multiple MySQL engines with MySQL Replication between them is a common design pattern for scaling read-heavy MySQL workloads. As with all scaling techniques, there are workloads for which it works very well but there are also potential issues that need to be understood. In this case, all write traffic is directed to the primary server and, consequently is not scaled which is why this technique works best for workloads heavily skewed towards reads. But, for those fairly common read heavy workloads, the techniques works very well and allows scaling the read workload across over a fleet of MySQL instances. Of course, as with any asynchronous replication scheme, the read replicas are not transactionally updated. So any application running on MySQL read replica’s must be tolerant of eventually consistent updates.

Load balancing high read traffic over multiple MySQL instances works very well but this is only one of the possible tools used to scale this type of workload. Another very common technique is to put a scalable caching layer in front of the relational database fleet. By far the most common caching layer used by high-scale services is Memcached.

Another database scaling technique is to simply not use a relational database. For workloads that don’t need schema enforcement and complex query, NoSQL databases offer both a cheaper and a simpler approach to hosting the workload. SimpleDB is the AWS hosted NoSQL database with Netflix being one of the best known users (slide deck from Netflix’s Adrian Cockcroft: http://www.slideshare.net/adrianco/netflix-oncloudteaser). Cassandra is another common RDBMS alternative in heavy use by many high-scale sites including Facebook where it was originally conceived. Cassandra is also frequently run on AWS with the Cassandra Wiki offering scripts to make it easy install and configure on Amazon EC2.

For those workloads where a relational database is the chosen solution, MySQL read replication is a good technique to have in your scaling tool kit. Last week Amazon announced read replica support for the AWS Relational Database Service. The press release is at: Announcing Read Replicas, Lower High Memory DB Instance Price for Amazon AWS.

You can now create one or more replicas of a given “source” DB Instance and serve incoming read traffic from multiple copies of your data. This new database deployment option enables you to elastically scale out beyond the capacity constraints of a single DB Instance for read-heavy database workloads. You can use Read Replicas in conjunction with Multi-AZ replication for scalable, reliable, and highly available production database deployments.

If you are running MySQL and wish you had someone else to manage it for you, check out Amazon RDS. The combination of read replicas to scale read workloads and Multi-AZ support for multi-data center, high availability make it a pretty interesting way to run MySQL.

–jrh

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

4 comments on “Scaling AWS Relational Database Service
  1. Great article Sid. I’ll blog that one.

    Thanks for forwarding my way.

    –jrh
    jrh@mvdirona.com

  2. Hi James,
    Nice article. I am interested in looking at RDS in the near future for a few select use-cases that require full-relational support.

    I just released a white paper this week about Netflix’s use of SimpleDB and S3. Have a read and let me know your thoughts!

    Title : Netflix’s Transition to High-Availability Storage Systems
    Link : http://bit.ly/97t63T

    Sid ( twitter : ‘at’ r39132 )

  3. That’s a good point Edward. Cloning off an exact copy of a production database is super easy so you can just pop a database copy for ugly ad hoc queries and not have to worry about negative impact on the production system.

    –jrh
    James Hamilton

  4. James,

    Nice post. I have use the RDS service for many development projects with great success. The nice part about RDS is the ability to start and stop the service when you need access to the Database. So for intermittent access to database resource for development RDS ROCKS.

    If you are doing database Data Mining the ability to take a copy of the Database and perform "nasty" long query requests against a copy of the Database is great.

    I enjoyed your post.

    Edward M. Goldberg
    http://myCloudWatcher.com/
    e.m.g.

Leave a Reply

Your email address will not be published. Required fields are marked *