Many years ago I worked on IBM DB2 and so I occasionally get the question, “how the heck could you folks possibly have four relational database management system code bases?” Some go on to argue that a single code base would have been much more efficient. That’s certainly true. And, had we moved to a single code base, that engineering resource efficiency improvement would have led to a very different outcome in the database wars. I’m skeptical on this extension of the argument but the question is an interesting one and I wrote up a more detailed answer than usually possible off the cuff.

——————–

IBM Relational Database Code Bases

Few server manufacturers have the inclination and the resources needed to develop a relational database management system and yet IBM has internally developed and continues to support four independent, full-featured relational database products. A production-quality RDBMS with a large customer base is typically well over a million lines of code and represents a multi-year effort of hundreds and, in some cases, thousands of engineers. These are massive undertakings requiring special skills, so the question sometimes comes up, how could IBM possibly end up with four different RDBMS systems that don’t share components?

At least while I was at IBM, there was frequent talk of developing a single RDBMS code base for all supported hardware and operating systems. The reasons why this didn’t happen are at least partly social and historical, but there are also many strong technical challenges that make it difficult rewind the clock and use a single code base. The diversity of the IBM hardware and operating platforms would have made it difficult, the deep exploitation of unique underlying platform characteristics like the single level store on the AS/400 or the Sysplex Data Sharing on System z would make it truly challenging, the implementation languages used by many of the RDBMS code bases don’t exist on all platforms raised yet another road block, and differences in features and functionality across the four IBM database code bases make it even less feasible. After so many years of diverse evolution and unique optimizations, releasing a single code base to rule them all would almost certainly fail to be feature and performance compatible with prior releases. Consequently, IBM has four different relational database management system code lines maintained by four different engineering teams.

DB2/MVS, now called Db2 for z/OS, is a great product optimized for the z/OS operating system supporting unique System z features such as the Sysplex Coupling Facility. Many of IBM’s most important customers still depend on this database system and it would be truly challenging to port to another operating system such as Windows, System I, UNIX or Linux. It would be even more challenging to replace Db2 for z/OS with one of the other IBM relational code bases. Db2 for z/OS will live on for the life of the IBM mainframe and won’t likely be ported to any other platform or ever be replaced by another RDBMS code line from within IBM.

DB2/400, now called Db2 for i, was the IBM relational database for the AS/400. This hardware platform, originally called the System/38, was released way back in 1979 but continues to be an excellent example of many modern operating system features. Now called System i, this server hosts a very advanced operating system with a single level store where memory and disk addresses are indistinguishable and objects can transparently move between disk and memory. It’s a capability-based system where pointers, whether to disk or memory, include the security permissions needed to access the object referenced. The database on the System i exploits these system features, making Db2 for i another system-optimized and non-portable database. As with Db2 for z/OS, this code base will live on for the life of the platform and won’t likely be ported to any other platform or ever be replaced by another RDBMS code line.

There actually is a single DB2 code base for the VM/CMS and the DOS/VSE operating systems. Originally called SQL/Data System or, more commonly, SQL/DS (now officially Db2 for VSE & VM), it is the productization of the original System R research system. Some components such as the execution engine have changed fairly substantially from System R, but most parts of the system evolved directly from the original System R project developed at the IBM San Jose Research Center (later to become IBM Almaden Research Center). This code base is not written in a widely supported or portable programming language and recently it hasn’t had the deep engineering investment the other IBM RDBMS code bases have enjoyed. But it does remain in production use and continues to be fully supported. It wouldn’t be a good choice to port to other IBM platforms and it would be very difficult to replace while maintaining compatibility with the previous releases in production on VM/CMS and DOS/VSE.

For the OS/2 system, IBM wrote yet another relational database system but this time it was written in a portable language and with fewer operating system and hardware dependencies. When IBM needed a fifth RDBMS for the RS/6000, many saw porting the OS/2 DBM code base as the quickest option. As part of this plan, the development of OS/2 Database Manager (also called OS/2 DBM) was transferred from the OS/2 development team to the IBM Software Solutions development lab in Toronto. The mission was both to continue supporting and enhancing OS/2 DBM but also to port the code base to AIX on the RS/6000. We also went on to deliver this code base on Linux, Windows, HP/UX, and Sun Solaris.

My involvement with this project started as we began the transfer of the OS/2 DBM code base to Toronto. It was an exciting time because not only were we going to have a portable RDBMS code base and be able to support multiple platforms but, in what was really unusual for IBM at the time, we would also support non-IBM operating systems. This really felt to me like “being in the database business” rather than being in the systems business with a great database.

One of the first things we discovered was OS/2 DBM was really struggling with our largest customers and they were complaining to the most senior levels at IBM. I remember having to fly into Chicago to meet with an important IBM customer who was very upset with OS/2 Database Manager stability. As I pulled up in front of their building, a helicopter landed on the lawn with the IBM executives who had flown in from headquarters for the meeting. I knew that this was going to be a long and difficult meeting and it was.

We knew we had to get this code stable fast, but we also had made commitments to the IBM Software Solutions leadership to quickly be in production on the RS/6000. The more we learned about the code base, the more difficult the challenge looked. The code base wasn’t stable, didn’t perform well, nor did it scale well in any dimension. It became clear we either had to choose a different code base to build upon or make some big changes fast.

There was a lot to be done and very little time. The pressure was mounting and we were looking at other solutions from a variety of different sources when the IBM Almaden database research team jumped in. They offered to put the entire Almaden database research team on the project, with a goal to both replace the OS/2 DBM optimizer and execution engine with Starburst (Database research project) components and to help solve the scaling and stability problems we were currently experiencing in the field. Taking a research code base is a dangerous step for any development team, but this proposal was different in that the authors would accompany the code base. Pat Selinger of IBM Almaden Research essentially convinced us that we would have a world-class optimizer and execution engine and we would have the full-time commitment from Pat, Bruce Lindsay, Guy Lohman, C. Mohan, Hamid Pirahesh, John McPherson and the rest of the IBM Almaden database research team working shoulder to shoulder with us in making this product successful.

The decision was made to take this path. At around the same time we were making that decision, we had just brought the database up on the RS/6000 and discovered that it was capable of only 6 transactions per second measured using TPC-B. The performance leader on that platform at the time, Informix, was able to deliver 69 tps. This was incredibly difficult news in that the new Starburst research optimizer, although vital for more complex relational workloads, would have virtually no impact on the simple transactional performance of TPC-B.

I remember feeling like quitting as I thought through where this miserable performance would put us as we made a late entrance to the UNIX database market. I dragged myself up out of my chair and walked down to the Janet Perna’s office. Janet was the leader of IBM Database at the time and responsible for all code bases on all platforms. I remember walking into Janet’s office and, more or less without noticing she was already meeting with someone, and blurting out “we have a massive problem.” She asked for the details and Janet, typical of her usual “just get it done” style to all problems, said “well, we’ll just have to get it fixed then. Bring together a team of the best from Toronto and Almaden and report weekly.” Janet is an incredible leader and, without her confidence and support, I’m not sure we would have even started the project. Things just looked too bleak.

Instead of being a punishing or unrewarding long march, the performance improvement project was one of the best experiences of my life. Over the course of six months, the joint Toronto/Almaden team transformed the worse performing database management system to the best. When we published our audited TPC-B performance later that year, it was the best performing database management system on the RISC System/6000 platform.

It was during this performance work that I really came to depend upon Bruce Lindsay. I used to joke that convincing Bruce to do anything was nearly impossible but, once he was believed it was the right thing to do, he could achieve as much by himself as any mid-sized engineering team. I’ve never seen a problem too big for Bruce. He’s saved my butt multiple times over the years and, although I’ve bought him a good many beers, I probably still owe him a few more.

The ad hoc Toronto/Almaden DB2 performance team did amazing work and that early effort not only saved the product in the market but also cemented the trust between the two engineering teams. Over subsequent years, many great features were delivered and much was achieved.

Many of the OS/2 DBM quality and scaling problems were due to a process model where all connected users ran in the same database address space. We knew that needed to change. Matt Huras, Tim Vincent and the teams they led completely replaced the database process model to one where each database connection had its own process each of which could access a large shared buffer pool. This gave the isolation needed to run reliably. They also kept the ability to run in operating system threads, and put in support for greater than 4GB-addressing even though all the operating systems we were using at the time were 32-bit systems. This work was a massive improvement in database performance and stability. It was a breath of fresh air to have the system stabilized at key customer sites so we could focus on moving the product forward and functionally improving it with a much lower customer support burden.

Another problem we faced with this young code base, originally written for OS/2, was that each database table was stored in its own file. There are some downsides to this mode but, generally, it can be made to work fairly well. What was absolutely unworkable was that no table could be more than 2GB. Even back then, a database system where a table could not exceed 2GB is pretty close to doomed.

At this point, we were getting close to our committed delivery date and the collective Toronto and Almaden teams had fixed all the major problems with the original OS/2 DBM code base and had it ported and running well on AIX. We also could support other operating systems and platforms fairly easily. But, the one problem we just hadn’t found a way to address was the 2GB table size limit.

At the time I was lead architect for the product and felt super-strongly that we needed to address the table size limitation of 2GB before we shipped. I was making that argument vociferously but the excellent counter argument was we were simply out of time. Any reasonable redesign would have delayed us significantly from our committed product ship dates. Estimates ranged from 9 to 12 months and many felt bigger slips were likely if we made changes of this magnitude to the storage engine.

I still couldn’t live with the prospect of shipping a UNIX database product with this scaling limitation so I ended up taking a long weekend and writing support for a primitive approach to supporting greater-than-2GB tables. It wasn’t a beautiful solution but the beautiful solutions had been investigated extensively and just couldn’t be implemented quickly enough. What I did was implement a virtualization layer below the physical table manager that allowed a table to be implemented over multiple files. It wasn’t the most elegant of solutions but it certainly was the most expedient. It left most of the storage engine unchanged and, after the files were opened, it had close to no negative impact on performance. Having this code in our hands and it being able to pass our full regression test suite swung the argument the other way and we decided to remove the 2GB table size limit before shipping.

When we released the product, we had the world’s fastest database on AIX measured using TPC-B. We also had the basis for a very available system and the customers that were previously threatening legal action became happy reference customers. Soon after, we shipped the new Starburst optimizer and query engine.

This database system became quite successful and, after working on it for many releases, it remains one of the best engineering experiences of my life. The combined Toronto and Almaden teams are amongst the most selfless and talented group of engineers with which I’ve ever worked. Janet Perna, who headed IBM database at the time, was a unique leader who made us all better, had incredibly high expectations, and yet never was that awful boss you sometimes hear about. Matt Huras, Tim Vincent, Al Comeau, Kathy McKnight, Richard Hedges, Dale Hagen, Bernie Schieffer and the rest of the excellent Toronto DB2 team weren’t afraid of a challenge and knew how to deliver systems that worked reliably for customers. Pat Selinger is an amazing leader who helped rally the world-class Almaden database research team and kept all of us on the product team believing. Bruce Lindsay, C. Mohan, Guy Lohman, John McPherson, Don Chamberlin, the co-inventor of the Structured Query Language, Hamid Pirahesh and the rest of IBM Almaden database research team are all phenomenal database researchers but they are also willing to roll up their sleeves and do the sometimes monotonous work that seems to be about 90% of what it takes to ship high quality production systems. For example, Pat Selinger, an IBM Fellow and inventor of the relational database cost based optimizer, spent vast amounts of her time writing the test plan and some of the tests used to get the system stable and gain confidence that it was ready for production.

IBM continues to earn billions annually from its database offerings so it’s hard to refer to these code bases as anything other than phenomenal successes. An argument might be made that getting to a single code base could have allowed the engineering resources to be applied more efficiently. I suppose that is true, but market share is even more important than engineering efficiency and what would have helped grow market share faster would have been to focus the database engineering, marketing, and sales resources on selling DB2 on non-IBM server platforms earlier and with more focus. It’s certainly true that Windows has long been on the DB2 supported platforms list, but IBM has always been most effective selling on its own platforms. That’s still true today. DB2 is available on the leading cloud computing platform but, again, most IBM sales and engineering resources are still invested in their own competitive cloud platform. On this model, IBM database success will always be tied to IBM server platform market share.

Ramblings

57 comments on “Four DB2 Code Bases?”

Uday Subbarayan says:

January 26, 2020 at 2:26 pm

An excellent article, James.

Do you have any insights into why HP failed & how Oracle succeeded in RDBMS back then?

Reply
- James Hamilton says:
  
  January 27, 2020 at 6:02 am
  
  HP has purchased many different database products over the years but I don’t know of one they started internally and grew organically. However, outside of HP, there certainly have been many a great many database companies that have found success and then fairly quickly lost it. Sybase dominated the financial market segment and led in many others but ended up becoming irrelevant. Informix was, for a time, the fastest transaction processing database and they had a wonderful parallel offering but had some accounting issues, made some missteps in the market and ended up first becoming far less relevant and then later being purchased by IBM. Many excellent database products have achieved success and then rapidly become largely irrelevant in the market.
  
  Market conditions have changed over this period as well. 15 years ago, most customers would chose a single database offering and almost everything they did would be hosted on that single database product. Today, the market is much more mature and most customers want the best offering for each different application type. They want a relational database for some parts of their business. And, even where they want a RDBMS, they might use a row store for operational transaction processing systems and a column store for decision support workloads. They want a key-value store for other parts. They want a document store for some applications. Customers want workload optimized databases. Open source is no longer a scary prospect for a high value production workload. Cloud computing takes away the complexity of managing 5 to even 10 different database management systems further reducing the friction to choosing workload optimized databases.
  
  The days of the one true database to rule them all are over.
  
  Reply
Philip Gunning says:

February 11, 2018 at 7:41 pm

Wow, as a non-IBMer I cam to know Db2/2 around 1995 and have been working with it ever since. Never met you James but know most of the people referenced in this article. Would have been wesome to know you back when all this was happening.

Reply
- James Hamilton says:
  
  February 12, 2018 at 7:36 am
  
  They were exciting times and things were changing fast. it’s amazing how broadly installed relational databases have become. The database world is again going through a massive transition with cloud computing opening up the market for different designs and another big breath of innovation.
  
  Reply
Henrik Loeser says:

February 11, 2018 at 6:44 am

Great article, cool photo. Stumbled over this on Twitter.
As PhD. student I was user of Informix and Illustra and wanted to get a job with them. IBM bought Informix and I ended up at the Silicon Valley Lab, working on DB2. And with many of those guys on the photo.

Reply
- James Hamilton says:
  
  February 11, 2018 at 8:00 am
  
  Good choice. There probably isn’t a better place to learn RDBMS internals. But, having done that, it’s now time to go work for AWS on cloud hosted databases.
  
  Reply
Kelly Schlamb says:

January 15, 2018 at 2:17 pm

Great read, James. Hope all is well with you and Jennifer.

Reply
- James Hamilton says:
  
  January 15, 2018 at 2:19 pm
  
  Thanks Kelly!
  
  Reply
Daniel Wood says:

January 9, 2018 at 7:31 pm

Having cut my teeth on database internals at Informix for 11 years from ’94 through ’04 I often wished IBM had gone a different direction. Having worked on DB internals at Informix, IBM, a startup db, Sybase, SAP, Oracle and now Postgres(Salesforce then Amazon) I still consider it the best combination of having clean, easy to understand code, while having the enterprise characteristics that something like Postgres is so lacking in many ways. IMO, the lack of true MVCC and a 32 bit storage system, limiting greater than 2GB non-partitioned tables were two of the issues limiting its future. Informix + IBM gave me my career in DB internals going from tech support to one of the senior dev architects. Fond memories indeed.

I wish IBM would open source it.

Reply
- James Hamilton says:
  
  January 10, 2018 at 5:48 am
  
  You have worked on a lot of different code bases. You almost have the leaders covered. I agree that Informix was a very nice system that was fairly easy to work with and broadly featured.
  
  Reply
Felix Naumann says:

January 9, 2018 at 10:48 am

Hi James, a very interesting story, thanks! A while ago I had created a genealogy of many relational DBMS, including the four code lines you talk about. Maybe you could have a look to see if the representation is accurate and if you have anything to correct or add: https://hpi.de/naumann/projects/rdbms-genealogy.html
I am currently working on an update anyway, and can only rely on people like you with intimate knowledge of what went on at the time.
Thanks, Felix

Reply
- James Hamilton says:
  
  January 9, 2018 at 1:21 pm
  
  Wow, that’s an interesting chart. It’ll be challenging to get right but useful if you can. Some issues I saw:
  *DB2/MVS was sourced from SystemR (recommend confirming with Don Haderle or Pat Selinger)
  *System/38 did not share any code from SystemR (recommend confirming with Pat Selinger)
  *The DB2 UDB code line did not originate in DB2/MVS. It’s was originally written as OS/2 Database Manager (see my article).
  *Stonebraker can confirm if the Informix code base did branch from Postgres.
  *I’m not familiar with IBM IS1 and didn’t think it was a contributor to SystemR. Recommend checking with Pat Selinger or Bruce Lindsay).
  
  Reply
Jeff Goss says:

January 2, 2018 at 4:39 pm

Not only did we look younger, the baby on my T-shirt is now in first year at McMaster University in Life Sciences! The other amazing part about this photo is how we all stayed with IBM given how the DB2 team has spread over so many companies and projects over the years. Dale and Hershel retired from IBM. Glad Matt sent this my way although I didn’t see it until now.

Reply
- James Hamilton says:
  
  January 3, 2018 at 1:20 pm
  
  McMaster? Amazing — that kind of puts perspective on how long it’s been. Hershel dropped by to visit us on the boat in Seattle a few years back and we stay in touch. Jennifer and I had lunch with Janet when we were last in Florida 2 years back. That was super fun.
  
  Reply
Les King says:

December 28, 2017 at 2:34 pm

Awesome article James. Cool picture. Brings back memories of being in our brand new support organization at this time and dealing with the customer expectations on OS/2 DBM. The team you outline laid the foundation for taking OS/2 DBM into the DB2 (now Db2) we have today. Being able to have data management front and center in my career since then to now has been a real blessing.

Reply
- James Hamilton says:
  
  December 29, 2017 at 6:56 am
  
  Hey, good hearing from you Les. You’re right. It definitely was a busy time for the support organization.
  
  Reply
Leon Katsnelson says:

December 27, 2017 at 9:37 pm

The post brought back great memories and it all came back as if it was yesterday. I remember the process model debates and the 2GB table size limit. When you came back with the solution it was like having your team score the wining goal in the final second of the game. I think this is when your status was elevated to “Demi-god”. All the best on your travels on the high seas!

Reply
- James Hamilton says:
  
  December 28, 2017 at 6:39 am
  
  Yeah, it was a fun time. By far, the process model changes were absolutely vital to us getting the product quality under control and the needed changes were massive. Tim and Matt and their teams really aced that one. Rarely does big work come in on such a short glide path.
  
  Reply
Lance Amundsen says:

December 26, 2017 at 1:03 am

What a walk down career memory lane. I owned the CAE client api in Austin and handed it off to Toronto in the early 90s move. Then I put a lot of the performance changes in to the as/400 odbc driver up in Rochester. Lastly I supported all four code based on the SWG HiPods team. Great people all along and many friends to this day. Heck I am lost forgot, I wrote the original programming manuals when Austin hired me in 89. I still remember demonstrating rollback in a sample program that “demoted all managers” as the sql operation. I wonder how long that sample stayed in the pubs lol

Reply
- James Hamilton says:
  
  December 26, 2017 at 6:11 am
  
  The move from Austin to Rochester must have been a shock on the system once winter had rolled around :-). I’ll bet your programming examples lived for years and, who knows, some may still be there today.
  
  Reply
Bernard Golden says:

December 25, 2017 at 5:07 pm

Heh. I ran part of the Informix database engineering group at that time. It was a heady period for relational databases, as they were in the steep growth curve of adoption. It’s easy to forget today, but there was a time that RDBMs were like a magical technology that vastly improved application development.

Thanks for the nice memoir of the time and work.

P.S. As I’m sure you know, IBM now owns Informix, where it’s still maintained (I think) as a separate brands. In fact, I ran into someone from the Informix group not so long ago and he told me that Informix is very popular in IoT devices, as it has a fast and efficient logging/recovery capability.

Reply
- James Hamilton says:
  
  December 25, 2017 at 7:21 pm
  
  Yes, I do know Informix XPS reasonably well. In fact, when I said above “The pressure was mounting and we were looking at other solutions from a variety of different sources” one of those sources was Informix. Around that time, I went out to the Informix Portland Lab and ported part of Informix to OS/2. Not many people at IBM or Informix knew that work was going on at the Informix Portland lab (Gary Kelly’s group). But I did spend a couple of days in Menlo Park where I met with most team leads and CEO Phil White.
  
  Informix XPS had a really nice design where the core scheduling/dispatch engine is separate form the rest of the system and even has it’s own set of tests. It’s like an operating system on which the rest of the database system is hosted. Below this core component is the layer that needs to be tailored to each operating system. So, porting Informix is writing a new version of the O/S access layer which is pretty small and actually fairly easy to do. Then porting the core scheduling/dispatch engine which is a bit of work but it will run on it’s own without the rest of the database and it has it’s own tests so it’s a relatively easy component to get running. Once it’s running, getting the rest running is work but few issues are expected.
  
  I ported the Informix core scheduling/dispatch system to OS/2 and got all the tests passing. That was enough to convince us that an OS/2 port of Informix XPS was practical and not that much work would be required. We ended up not going that direction but it was an interesting couple of weeks working on the XPS code base. I thought it was a very nicely thought through system.
  
  Reply
Walter Alvey says:

December 25, 2017 at 12:16 am

I’m curious who is in the picture. I recognize some of them but not everyone.

Reply
- Emad Boctor says:
  
  December 25, 2017 at 12:53 am
  
  Standing, Left to right:
  
  Jeff Goss, Mike Winer, Sam Lightstone, Tim Vincent, Matt Huras
  
  Sitting, Left to right:
  
  Dale Hagen, Bernie Schiefer, Ivan Lew, Herschel Harris, Kelly Schlamb.
  
  Reply
  - Walter Alvey says:
    
    December 30, 2017 at 12:27 am
    
    Thanks for posting the names.
    
    Reply
John Willsher says:

December 25, 2017 at 12:11 am

I used DB2 when it only supported a rules (ie. not cost) based sql optimiser. My idea is to go with DB3 version 1 to support a true 3dimensional table in a 3 dimensional array in core. Imagine the Cartesian product of 1 or more of these tables joined together. Visualise as X.Y,Z Cartesian coordinates or a bunch of oxo cubes joined together.The 3rd dimension could be the imaginary part f a mathematical complex number. Whatever tat may be:) season’s greetings from john willsher

Reply
paul tormey says:

December 25, 2017 at 12:04 am

This needs editing…the statement(s) in question are :”DB2/400, now called Db2 for i, is the IBM relational database for the AS/400. This hardware platform, originally called the System/38, was released way back in 1979 but continues to be an excellent example of many modern operating system features. Now called System i, ”
Should be “DB2/400 was the IBM relational database for the AS/400″…”Db2 for i is the IBM relational database for the IBM i system which currently runs on the IBM Power server systems.

At least let’s keep current…thanks.

Reply
- James Hamilton says:
  
  December 25, 2017 at 5:55 am
  
  The team of professional editors on perspectives must have dropped the ball on this one Paul :-).
  
  Reply
Stephen Li says:

December 24, 2017 at 10:37 pm

Great war story to read and learn from. What hit me was that at such a large company as IBM, you could still just walk into your R&D boss’ office to blurt out the issue rather than being required to schedule a meeting to report with ppt. :D

Reply
- James Hamilton says:
  
  December 25, 2017 at 5:41 am
  
  Janet didn’t believe in excess formality but she was always very busy so I would have to schedule something if I needed her time. In this case, I just didn’t. This news just felt too urgent.
  
  Reply
Linda Igra says:

December 24, 2017 at 5:13 pm

James, this is Linda Fiszer Igra :-) Great article and great memories of pushing out DB2/2 V 1.5 as the first release ever from Toronto Lab!!! Love the picture, what a laugh! – brings back great memories of the brilliant people we had the pleasure of working with everyday.
And to top it off … one of the commentators Jack Orenstein was my Intro to Comp Sci prof – Jack – you taught me Fortran in 1980 at McGill!!! What a scream!

Reply
- James Hamilton says:
  
  December 25, 2017 at 5:36 am
  
  Hey Linda! Hope all is well. The picture is a real gem. I just love it.
  
  Reply
- Jack Orenstein says:
  
  May 29, 2019 at 6:58 pm
  
  Hi Linda! Tim Merrett just brought my attention to this article.
  
  Reply
Peter Shum says:

December 24, 2017 at 3:25 am

Hey James, this is a great article and love the photo! All those names bring back fond memory working with you guys. This is a piece of “oral history” for DB2 LUW!

Reply
- James Hamilton says:
  
  December 24, 2017 at 6:17 am
  
  Hi ya Peter! I agree with you. We have all worked on a lot of successful products since DB2 but these were special times and I’m still proud of what the team delivered.
  
  Reply
Thomas Hinders says:

December 23, 2017 at 6:10 pm

A great example of what made IBM great….I retired 2 years ago, and what I hear from those still working leads me to believe this is no longer the case….

Reply
Jack Orenstein says:

December 23, 2017 at 6:06 pm

Great article. I know many of the names you mentioned as System R researchers. It is very impressive to find out how hands-on they were.

Reply
John McPherson says:

December 23, 2017 at 5:06 pm

Fantastic posting James. It was a wonderful experience working with everyone on that team on a lot of fascinating engineering challenges. Your leadership was an important part of the project’s success. You might have had your doubts, but you never let it show. Your comment on testing reminded me that another one of our famous testers was Don Chamberlin, the co-inventor of SQL. Everyone from the veterans to the fresh hires worked together to accomplish something that many outsiders didn’t think was possible.

Reply
- James Hamilton says:
  
  December 24, 2017 at 5:59 am
  
  Good hearing from you John. Good point on Don Chamberlin. He was a joy to work with and a big part of the success we found. Happy holidays!
  
  Reply
Julia Johnston says:

December 23, 2017 at 4:13 pm

Wonderful read on what went on behind the scenes! Many names mentioned helped my clients over the years as I technically sold each of these great DB2 systems. Thank you all!

Reply
- James Hamilton says:
  
  December 23, 2017 at 4:48 pm
  
  Thanks for helping to make DB2 successful.
  
  Reply
Andy Pavlo says:

December 23, 2017 at 2:08 pm

Great post. I will reference this in my class next semester. I have two additional questions:

(1) Do you know whether IBM does high-level (i.e., SQL) compatibility tests between the different codebases? That is, take the same database, load into each system, then run a bunch of queries to see whether all four implementations produce the same output?

(2) How does IBM’s “Eagle” project (i.e., SQL on IMS) fit into this? I heard that’s where they decided to first borrow pieces from System R?

Reply
- James Hamilton says:
  
  December 23, 2017 at 3:53 pm
  
  Andy asked if IBM does high-level compatibility testing between the different variants of DB2? Yes, there was some and I suspect at least some of this testing is still done today. Years ago a common subset of SQL was defined across all of IBM called Systems Application Architecture SQL or, more commonly SAA SQL. If a customer stayed within this common SQL subset there was a good chance their query would run everywhere. But, since the database systems don’t share code and are completely different implementations there will be differences even in the subset and, of course, all four code bases implement large numbers of features unique to that code base.
  
  Project Eagle was an effort to put a relational head end on IMS. Clearly, at a high level, it was a very compelling strategy to have both relational and the existing large install base of IMS customers all running on the same code base. But, like so many “high level ideas” it’s pretty close to impossible to actually deliver upon that vision and project Eagle was canceled.
  
  For more on Project Eagle, see “Readings in Database Systems”: https://www.amazon.com/Readings-Database-Systems-MIT-Press/dp/0262693143/ref=sr_1_1?ie=UTF8&qid=1514044333&sr=8-1&keywords=readings+in+database+systems.
  
  Reply
  - Jack Orenstein says:
    
    December 23, 2017 at 7:07 pm
    
    Sadly, implementing one model on top of a radically different one tends not to go well. I say this having built two object/relational mappers, one at an OODBMS startup (cooperative work with Almaden!) and then a Java-based one. At the OODBMS company, we also tried putting a SQL layer on top of our system.
    
    These systems are too fragile. It’s too hard to get acceptable performance, and it seems like you always have to dive down to the lower layer. So instead of being able to use Java and forget about SQL (for example), you end up having to deal with both. For the very simplest queries, the mapping is fine. But the rest of the time, you are trying to get your mapping layer to product the “right” SQL, when it would have been far easier to just write the SQL yourself.
    
    Reply
    - James Hamilton says:
      
      December 24, 2017 at 6:02 am
      
      I think you are right Jack. It’s just about impossible for the fundamental characteristics of the underlying system to not show through. Native implementations almost always end up faster, less fragile, and easier to use.
      
      Reply
Brent Ozar says:

December 23, 2017 at 12:19 pm

I really appreciate you sharing this. It’s eye-opening, and it’s interesting to think about in terms of Microsoft’s support for SQL Server on Linux, too.

Reply
- James Hamilton says:
  
  December 23, 2017 at 4:00 pm
  
  Excellent point Brent. It’ll be interesting to see if Microsoft puts real sales, marketing, and engineering behind SQL Server on Linux. I think they might. However, it has long been the case that platform owners have a real advantage with apps targeting their own platforms but they have a similar disadvantage when on competitive platforms. Its super hard for platform owners to invest deeply in product for other platforms and the products are often weaker, sold less aggressively, and hardly marketed at all.
  
  Reply
- Matt Olson says:
  
  December 27, 2017 at 5:55 pm
  
  Very interesting indeed. Microsoft was able to achieve the dream goal of a single code base. This article sums things up pretty good:
  
  https://techcrunch.com/2017/07/17/how-microsoft-brought-sql-server-to-linux/
  
  Reply
  - James Hamilton says:
    
    December 28, 2017 at 6:34 am
    
    When I worked on Microsoft SQL Server, many of us were convinced Linux growth was real and wanted to put SQL Server on it. A decade of time was given up but it’s great that it is now on Linux. The next difficult test is will there be the sales and marketing push behind it and will the effort have the best engineers. Essentially, will it be “real” or just a tick box?
    
    So far it looks real to me but a decade earlier would have been much better — the open source offerings of PostgreSQL, MariaDB, and MySQL are all gaining share quickly and they are important enough that all cloud platform owners need to support them as well. MySQL actually has more share than Microsoft SQL Server and PostgreSQL will almost certainly pass it soon.
    
    Reply
Walid Rjaibi says:

December 23, 2017 at 4:54 am

Great article James! We did not have a chance to work together but I did interview with you on Sep. 23, 1996 at the Toronto Lab and I still remember most of our conversation :-)

Reply
- James Hamilton says:
  
  December 23, 2017 at 6:49 am
  
  Thanks Walid.
  
  Reply
Mike Winer says:

December 22, 2017 at 10:41 pm

What an awesome team we had, the early years were so fun, productive, and memorable! Great blog James – hope to see you again some time!

Reply
- James Hamilton says:
  
  December 23, 2017 at 6:46 am
  
  Totally agree Mike. Exciting times, great results, and an excellent team.
  
  Reply
Peeter Joot says:

December 22, 2017 at 8:02 pm

I never met you James, but I worked with everybody in your picture, and many of the individuals you named in this article. Matt was my first and favourite manager in the 19 years I worked on DB2. He led by simply letting you loose on problems, and by the time you finished having fun solving them, he had the solution he wanted.

Thanks for sharing this bit of history.

Reply
- James Hamilton says:
  
  December 23, 2017 at 6:02 am
  
  That definitely sounds like Matt. Thanks for the comment Peeter.
  
  Reply
Tim Vincent says:

December 22, 2017 at 4:10 pm

Great trip down memory lane James, was an incredibly exciting time. The photo brought a smile, we looked just a bit younger then!

Reply
- James Hamilton says:
  
  December 22, 2017 at 5:00 pm
  
  Hey Tim! I totally agree with you — It was an amazing time and that picture really brings back great memories.
  
  Reply

Perspectives

Four DB2 Code Bases?

IBM Relational Database Code Bases

57 comments on “Four DB2 Code Bases?”

Leave a Reply Cancel reply