Finally! I’ve been dying to talk about DynamoDB since work began on this scalable, low-latency, high-performance NoSQL service at AWS. This morning, AWS announced availability of DynamoDB: Amazon Web Services Launches Amazon DynamoDB – A New NoSQL Database Service Designed for the Scale of the Internet.
In a past blog entry, One Size Does Not Fit All, I offered a taxonomy of 4 different types of structured storage system, argued that Relational Database Management Systems are not sufficient, and walked through some of the reasons why NoSQL databases have emerged and continue to grow market share quickly. The four database categories I introduced were: 1) features-first, 2) scale-first, 3) simple structure storage, and 4) purpose-optimized stores. RDBMS own the first category.
DynamoDB targets workloads fitting into the Scale-First and Simple Structured storage categories where NoSQL database systems have been so popular over the last few years. Looking at these two categories in more detail, Scale-First is:
Scale-first applications are those that absolutely must scale without bound and being able to do this without restriction is much more important than more features. These applications are exemplified by very high scale web sites such as Facebook, MySpace, Gmail, Yahoo, and Amazon.com. Some of these sites actually do make use of relational databases but many do not. The common theme across all of these services is that scale is more important than features and none of them could possibly run on a single RDBMS. As soon as a single RDBMS instance won’t handle the workload, there are two broad possibilities: 1) shard the application data over a large number of RDBMS systems, or 2) use a highly scalable key-value store.
And, Simple Structured Storage:
There are many applications that have a structured storage requirement but they really don’t need the features, cost, or complexity of an RDBMS. Nor are they focused on the scale required by the scale-first structured storage segment. They just need a simple key value store. A file system or BLOB-store is not sufficiently rich in that simple query and index access is needed but nothing even close to the full set of RDBMS features is needed. Simple, cheap, fast, and low operational burden are the most important requirements of this segment of the market.
More detail at: One Size Does Not Fit All.
The DynamoDB service is a unified purpose-built hardware platform and software offering. The hardware is based upon a custom server design using Flash Storage spread over a scalable high speed network joining multiple data centers.
DynamoDB supports a provisioned throughput model. A DynamoDB application programmer decides the number of database requests per second their application should be capable of supporting and DynamoDB automatically spreads the table over an appropriate number of servers. At the same time, it also reserves the required network, server, and flash memory capacity to ensure that request rate can be reliably delivered day and night, week after week, and year after year. There is no need to worry about a neighboring application getting busy or running wild and taking all the needed resources. They are reserved and there whenever needed.
The sharding techniques needed to achieve high requests rates are well understood industry-wide but implementing them does take some work. Reliablyreserving capacity so it is always there when you need it, takes yet more work. Supporting the ability to allocate more resources, or even less, while online and without disturbing the current request rate takes still more work. DynamoDB makes all this easy. It supports online scaling between very low transaction rates to applications requiring millions of requests per second. No downtime and no disturbance to the currently configured application request rate while resharding. These changes are done online only by changing the DynamoDB provisioned request rate up and down through an API call.
In addition to supporting transparent, on-line scaling of provisioned request rates up and down over 6+ orders of magnitude with resource reservation, DynamoDB is also both consistent and multi-datacenter redundant. Eventual consistency is a fine programming model for some applications but it can yield confusing results under some circumstances. For example, if you set a value to 3 and then later set it to 4, then read it back, 3 can be returned. Worse, the value could be set to 4, verified to be 4 by reading it, and yet 3 could be returned later. It’s a tough programming model for some applications and it tends to be overused in an effort to achieve low-latency and high throughput. DynamoDB avoids forcing this by supporting low-latency and high throughout while offering full consistency. It also offers eventual consistency at lower request cost for those applications that run well with that model. Both consistency models are supported.
It is not unusual for a NoSQL store to be able to support high transaction rates. What is somewhat unusual is to be able to scale the provisioned rate up and down while on-line. Achieving that while, at the same time, maintaining synchronous, multi-datacenter redundancy is where I start to get excited.
Clearly nobody wants to run the risk of losing data but NoSQL systems are scale-first by definition. If the only way to high throughput and scale, is to run risk and not commit the data to persistent storage at commit time, that is exactly what is often done. This is where DynamoDB really shines. When data is sent to DynamoDB, it is committed to persistent and reliable storage before the request is acknowledged. Again this is easy to do but doing it with average low single digit millisecond latencies is both harder and requires better hardware. Hard disk drives can’t do it and in-memory systems are not persistent so flash memory is the most cost effective solution.
But what if the server to which the data was committed fails, or the storage fails, or the datacenter is destroyed? On most NoSQL systems you would lose your most recent changes. On the better implementations, the data might be saved but could be offline and unavailable. With dynamoDB, if data is committed just as the entire datacenter burns to the ground, the data is safe, and the application can continue to run without negative impact at exactly the same provisioned throughput rate. The loss of an entire datacenter isn’t even inconvenient (unless you work at Amazon :-)) and has no impact on your running application performance.
Combining rock solid synchronous, multi-datacenter redundancy with average latency in the single digits, and throughput scaling to the millions of requests per second is both an excellent engineering challenge and one often not achieved.
More information on DynamoDB:
· DynamoDB detail Page: http://aws.amazon.com/dynamodb/
· DynamoDB Developer Guide: http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/
· Blog entries:
· DynamoDB Frequently Asked Questions: http://aws.amazon.com/dynamodb/faqs/
· DynamoDB Pricing: http://aws.amazon.com/dynamodb/pricing/
Relational systems remain an excellent solution for applications requiring Feature-First structured storage. AWS Relational Database Service supports both the MySQL and Oracle and relational database management systems: http://aws.amazon.com/rds/.
Just as I was blown away when I saw it possible to create the world’s 42nd most powerful super computer with a few API calls to AWS (42: the Answer to the Ultimate Question of Life, the Universe and Everything), it is truly cool to see a couple of API calls to DynamoDB be all that it takes to get a scalable, consistent, low-latency, multi-datacenter redundant, NoSQL service configured, operational and online.