Earlier this week, I was in Athens Greece attending annual conference of the ACM Machinery Special Interest Group on Management of Data. SIGMOD is one of the top two database events held each year attracting academic researchers and leading practitioners from industry.
I kicked off the conference with the Plenary keynote. In this talk I started with a short retrospection on the industry over the last 20 years. In my early days as a database developer, things were moving incredibly quickly. Customers were loving our products, the industry was growing fast and yet the products really weren’t all that good. You know you are working on important technology when customers are buying like crazy and the products aren’t anywhere close to where they should be.
In my first release as lead architect on DB2 20 years ago, we completely rewrote the DB2 database engine process model moving from a process-per-connected-user model to a single process where each connection only consumes a single thread supporting many more concurrent connections. It was a fairly fundamental architectural change completed in a single release. And in that same release, we improved TPC-A performance a booming factor of 10 and then did 4x more in the next release. It was a fun time and things were moving quickly.
From the mid-90s through to around 2005, the database world went through what I refer to as the dark ages. DBMS code bases had grown to the point where the smallest was more than 4 million lines of code, the commercial system engineering teams would no longer fit in a single building, and the number of database companies shrunk throughout the entire period down to only 3 major players. The pace of innovation was glacial and much of the research during the period was, in the words of Bruce Lindsay, “polishing the round ball”. The problem was that the products were actually passably good, customers didn’t have a lot of alternatives, and nothing slows innovation like large teams with huge code bases.
In the last 5 years, the database world has become exciting again. I’m seeing more opportunity in the database world now than any other time in the last 20 years. It’s now easy to get venture funding to do database products and the number of and diversity of viable products is exploding. My talk focused on what changed, why it happened, and some of the technical backdrop influencing.
A background thesis of the talk is that cloud computing solves two of the primary reasons why customers used to be stuck standardizing on a single database engine even though some of their workloads may have run poorly. The first is cost. Cloud computing reduces costs dramatically (some of the cloud economics argument: http://perspectives.mvdirona.com/2009/04/21/McKinseySpeculatesThatCloudComputingMayBeMoreExpensiveThanInternalIT.aspx) and charges by usage rather than via annual enterprise license. One of the favorite lock-ins of the enterprise software world is the enterprise license. Once you’ve signed one, you are completely owned and it’s hard to afford to run another product. My fundamental rule of enterprise software is that any company that can afford to give you 50% to 80% reduction from “list price” is pretty clearly not a low margin operator. That is the way much of the enterprise computing world continues to work: start with a crazy price, negotiate down to a ½ crazy price, and then feel like a hero while you contribute to incredibly high profit margins.
Cloud computing charges by the use in small increments and any of the major database or open source offerings can be used at low cost. That is certainly a relevant reason but the really significant factor is the offloading of administrative complexity to the cloud provider. One of the primary reasons to standardize on a single database is that each is so complex to administer, that it’s hard to have sufficient skill on staff to manage more than one. Cloud offerings like AWS Relational Database Service transfer much of the administrative work to the cloud provider making it easy to chose the database that best fits the application and to have many specialized engines in use across a given company.
As costs fall, more workloads become practical and existing workloads get larger. For example, If analyzing three months of customer usage data has value to the business and it becomes affordable to analyze two years instead, customers correctly want to do it. The plunging cost of computing is fueling database size growth at a super-Moore pace requiring either partitioned (sharded) or parallel DB engines.
Customers now have larger and more complex data problems, they need the products always online, and they are now willing to use a wide variety of specialized solutions if needed. Data intensive workloads are growing quickly and never have there been so many opportunities and so many unsolved or incompletely solved problems. It’s a great time to be working on database systems.
· The slides from the talk: http://mvdirona.com/jrh/TalksAndPapers/JamesHamilton_Sigmod2011Keynote.pdf
· Proceedings extended abstract: http://www.sigmod2011.org/keynote_1.shtml
· Video of talk: https://services.choruscall.eu/links/sigmod1106.html# (select June 14th to get to the video)
The talk video is available but, unfortunately, only to ACM digital library subscribers (thanks to Simon Leinen for pointing out the availability of the video link above).