At SIGMOD 2019 in Amsterdam last month it was announced that the Amazon Aurora service has been awarded the 2019 SIGMOD Systems Award. From the awards committee:
The SIGMOD Systems Award is awarded to an individual or set of individuals to recognize the development of a software or hardware system whose technical contribqutions have had significant impact on the theory or practice of large-scale data management systems. The SIGMOD Systems Award Committee determines the recipient(s) of the award. This year’s award was given to the developers of the Aurora database system from Amazon AWS.
The developers of the Aurora database system are the recipients of the 2019 SIGMOD Systems Award for fundamentally redesigning relational database storage for cloud environments.
It’s nice to see Aurora recognized for taking a different approach to storage management. I used to joke that every 10 years or so I seemed to get hired by another company to do much the same work as I was doing back in the 80s and 90s at IBM. Sure, the newer database systems were easier to administer, they supported a broader set of types, they performed better, they were more reliable, and in some cases they were more extensible. But, at the core, the newer storage engines were built on the same design principles and core ideas as the older ones. Certainly, lots has changed but, if you read through Jim Gray and Andreas Reuter’s Transaction Processing: Concepts and Techniques published back in 1992, you would still be remarkably well- equipped to work on commercial or open source database systems today. I suspect I could still get a job cutting code on the storage engine of many database systems.
The core observation behind starting Aurora was that open source MySQL has a vast number of customers and applications but its performance often falls far short of industry-leading commercial systems. The goal for Aurora was to be 100% compatible with MySQL while, at the same time, offering performance that beats contemporary commercial database management systems, most of which come with “enterprise” pricing terms.
Where Aurora took a different approach from that of common commercial and open source database management systems is in implementing log-only storage. Looking at contemporary database transaction systems, just about every system only does synchronous writes with an active transaction waiting when committing log records. The new or updated database pages might not be written for tens of seconds or even minutes after the transaction has committed. This has the wonderful characteristic that the only writes that block a transaction are sequential rather than random. This is generally a useful characteristic and is particularly important when logging to spinning media but it also supports an important optimization when operating under high load. If the log is completing an I/O while a new transaction is being committed, then the commit is deferred until the previous log I/O has completed and the next log I/O might carry out tens of completed transactions that had been waiting during the previous I/O. The busier the log gets, the more transactions that get committed in a single write. When the system is lightly loaded each log I/O commits a single transaction as quickly as possible. When the system is under heavy load, each commit takes out tens of transaction changes at a slight delay but at much higher I/O efficiency.
Aurora takes a bit more radical approach where it simply only writes log records out and never writes out data pages synchronously or otherwise. Even more interesting, the log is remote and stored with 6-way redundancy using a 4/6 write quorum and a 3/6 read quorum. Further improving the durability of the transaction log, the log writes are done across 3 different Availability Zones (each are different data centers). In this approach Aurora can continue to read without problem if an entire data center goes down and, at the same time, another storage server fails. And it can continue to operate without degradation and can continue to complete log writes with any two of its 6 storage servers down, including an entire data center going offline.
Engineering a system that can operate with this level of redundancy is a fairly well-researched area and reasonably well-understood. What’s difficult is delivering this level of redundancy in a transaction processing database storage engine while continuing to deliver competitive transaction processing performance. The Aurora team supports full MySQL compatibility while writing log records across 6 servers in 3 different datacenters while supporting five times the performance of MySQL. There is an excellent paper on the architectural details behind the Aurora at: Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases.
Thanks to the ACM SiGMOD Awards Committee for recognizing Amazon Aurora with the 2019 SIGMOD Systems Award.