Amazon SimpleDB Announced

The number 1 Amazon AWS requirement just got met: structured storage. Amazon announced SimpleDB yesterday although it’s not yet available for developers to play with. I’m looking forward to being able to write a simple application against it – I’ve had fun with S3. But, for now, the docs will have to do.

In the announcement (http://aws.amazon.com/simpledb), it’s explained that SimpleDB is not a relational DB. AWS notes that some customers run relational DBs in EC2 and, those that need complex or strictly enforced schema, will continue to do this. Others that only need a simple structured store with much less administrative overhead will use SimpleDB. AWS explains that they will make it increasingly easy to do both.

The hard part of running a RDBMS in EC2 is that there is no data protection. The EC2 local disk is 100% ephemeral. Some folks are using block replicators such as DRBD (http://www.drbd.org/) to keep two MySQL systems in sync. It’s kind of a nice solution but requires some skill to set up. When AWS says “make it easier” I suspect they are considering something along these lines. A block replicator would be a wonderful EC2 addition. However, for those that really only need a simple structured store, SimpleDB is (almost) here today.

My first two interests are data model and pricing. The data model is based upon domains, items, attributes and values. A domain roughly corresponds to a database and you are allowed up to 100 domains. All queries are within a domain. Within a domain, you can create items. Each item has an ID (presumably unique). Every Item has attributes and attributes have values. You don’t need to (and can’t) declare the schema of an item and any item can have any number of attributes up to 256 per item. Attributes have values and the values can repeat. So a given item may have an attribute or may not and it can have the attribute more than once. Attributes have values and value are simple UTF-8 strings limited to 1024 bytes. All attributes are indexed.

The space overhead gives a clue to storage format:

· Raw byte size (GB) of all item IDs + 45 bytes per item +

· Raw byte size (GB) of all attribute names + 45 bytes per attribute name +

· Raw byte size (GB) of all attribute-value pairs + 45 bytes per attribute-value pair

The storage format appears to be a single ISAM index of (item, attribute, value). It wouldn’t surprise me if the index used in SimpleDB is the same code that S3 uses for metadata lookup.

The query language is respectable and includes: =, !=, <, > <=, >=, STARTS-WITH, AND, OR, NOT, INTERSECTION AND UNION. Queries are resource limited to know more than 5 seconds of execution time.

The storage model, like S3, is replicated asynchronously across data centers with the good and the bad that comes with this approach: the data is stored geo-redundantly which is wonderful but it is possible to update a table and, on a subsequent request, not even see your own changes. The consistency model is very weak and the storage reliability is very strong. I actually like the model although most folks I talk to complain that it’s confusing. Technically this is true but S3 uses the same consistency model. I’ve spoken to many S3 developers and never heard a complaint (admittedly some just don’t understand it).

Pricing was my second interest. There are three charges for SimpleDB:

· Machine Utilization: $0.14/machine hour

· Data Transfer:

o $0.10 per GB – all data transfer in

o $0.18 per GB – first 10 TB / month data transfer out

o $0.16 per GB – next 40 TB / month data transfer out

o $0.13 per GB – data transfer out / month over 50 TB

· Structured Storage: $1.50 GB/month

$1.50 GB/month or $18 GB/year is 10x the $1.80 GB/year charged by S3. Fairly expensive in comparison to S3 but a bargain compared to what it cost to manage an RDBMS and the hardware that supports it.

More data on SimpleDB from AWS: http://aws.amazon.com/simpledb.

Sriram Krishnan has an excellent review up at: http://www.sriramkrishnan.com/blog/2007/12/amazon-simpledb-technical-overview.html

Thanks to Sriram Krishnan (Develop Division) and Dare Obasanjo (WinLive Platform Services) for sending this one my way. I’m really looking forward to playing with SimpleDB and seeing how customers end up using it. Overall, I find it tastefully simple and yet perfectly adequate for many structured storage tasks.

–jrh

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.