MySQL 5.6 to MariaDB 10.2.13

It’s hard to believe that a relational database in personal use at home will ever have much of a load when it comes to transaction processing but our home RDBMS is surprisingly busy, with more than a hundred database interactions per second.  It’s still not even within an order of magnitude as busy as many business databases but it is likely is as busy as half of them.

Our “home” is actually a boat and the boat is fairly heavily instrumented and automated, Just about everything is stored in the relational database. We have every data point from every 5 seconds going back many years stored in the on-board RDMS.  The data that is collected and stored includes:

  • Tank levels: Fresh water, black water, grey water, starboard fuel, port fuel, supply fuel, and day tank fuel
  • Main engine data: RPM, load percentage, hours, coolant temp, oil pressure, intake air temperature, transmission temperature, transmission pressure, boost pressure, fuel pressure, and fuel temperature
  • 24V DC electrical: house voltage, house amperage, house state of charge start alternator temperature, house alternator temperature, start voltage, start amperage, and start state of charge
  • 120V AC: voltage, amperage, frequency, inverter temperature, and kWhrs.
  • 240V AC: voltage, amperage, frequency, inverter A temperature, Inverter B temperature, and kWhrs.
  • Generator: rpm, oil pressure, coolant temperature, battery voltage, hours, voltage, amperage, frequency, and kWhrs.
  • Shore Power: volts, amps, frequency, and kWhrs
  • Wing Engine: rpm, oil pressure, coolant temperature, hours, and start battery voltage
  • Location: Latigude, longitude, speed, course, heading, rate of turn, number of satellites in view, horizontal dilution of precision, and position dilution of position

There hundreds of other data points ranging from network router traffic over its three wide area network connections (satellite, cellular, and WiFi) to tens of parameters on satellite connectivity status, furnace operational data, and hundreds of control system state parameters. The boat is heavily automated and the control systems all run off database data. Example applicationss include load shedding (shedding least important electrical loads when drawing more power than available), generator autostart (starting generator when the batteries need charging), operator indicator and warning lights (status lights showing health of boat systems), alarming, remote monitoring (show remote state of boat systems and email if there is a problem), and remote actuation (for example, the ability to turn on the furnace from a restaurant if on the way home on a cold day).

In the end, there is lot of database traffic. I wrote the first version of this system back in 2005 when I was working on the SQL Server engineering team and, no big surprise, I chose to use SQL Server as the database. SQL Server is a pretty good database management system and it did a respectable job in this application for nearly a decade.

When I left Microsoft to join Amazon Web Services back in January 2009, I moved all the boat applications to open source languages and databases. I did this partly to learn, but mostly because I think that open source services are the right place to be whether at home or in a business setting. Open source solutions are often available on more platforms, they are more flexible, more cost effective, and often have far richer informal support networks. For the database, I chose MySQL mostly because it was, at the time, the broadest used open source relational database management system.

Overall, it was great to be back on open source but MySQL was a little bit like a step backwards in time. I just kept bumping my head into missing features that I remember implementing in DB2 during the late 80s. However, MySQL is open source, the price is right and every problem I ran into had an easy to find work around or someone in the informal support system had already solved the problem. Despite all the limitations, I’ve come to like MySQL and use it all over the place these days.

However, MySQL has developed some issues over the years. In 2008, MySQL was acquired by Sun Microsystems. MySQL has a fairly tightly controlled code base and so the commercial interest that owns the company have considerable control over the MySQL base even though it is open source. But, Sun didn’t abuse its control of the MySQL code base and, generally, I’ve always really liked Sun. I first used their systems back in the mid-80s and they were excellent, good price/performers, and the founders were easy to respect, particularly the super innovative Andy Bechtolsheim. But, when Oracle bought Sun in 2010, it then really was a problem. MySQL is now owned by a company that is famous for high prices, aggressive sales tactics, and customer unfriendly practices.

Given my less than flattering summary of Oracle, why was I still using a product controlled by Oracle in early 2018?  Well, it’s sort of embarrassing but, in the end whether a company or an individual, we are all busy. It’s the same reason why some companies that have been through surprise price increases and license audits might still be using Oracle today. There are lots of reasons to move to better price/performing solutions but we’re all busy and don’t always have time to invest in a database platform change. These moves between databases require a massive amount of work and often other business priorities are more important. No company likes poor treatment but you can’t solve every problem they day it occurs. Unfortunately, the commercial database companies are well aware that customer inertia is one of the most powerful forces in the database world.

However, in the end, the action always follows long periods of problems or the existence of better alternatives. Companies and individuals eventually elect to make a change, if change is needed. It may not happen on day one but the history of our industry shows that change does happen and, once it starts, the transitions can move faster than one would expect. In my early days in the database world, IBM effectively owned most workloads at finance and insurance companies. Later, Sybase became the darling of the financial community and won many of these workloads. Yet today they are hard to find anywhere.  Toward the end of the Sybase supremacy, Informix had a fine RDBMS that topped the transaction processing benchmarks but they too are close to invisible today. Oracle still has a very large share of the relational database market but, there are alternatives and this share is shrinking.

In the open source database world, MySQL has the dominant share but, Oracle owns it and there are many excellent alternatives available. PostgreSQL is a technical strong alternative with growing market share. I’ve come close to moving my personal systems to PostgreSQL for years. The challenge with PostgreSQL is it is quite different from MySQL. I still recall the massive job of moving from SQL Server to MySQL. It was huge so I’ve not been excited about the prospect of moving from MySQL to PostgreSQL even though PostreSQL appears to be a better offering with a faster growing user base.

On the day Oracle bought Sun (and MySQL) back in 2010, the founders of MySQL started MariaDB. I’ve met with the Maria leadership and like them. I’ve always viewed it as an excellent alternative to MySQL but, you know the story, the work of moving to a different code base is pretty serious in the RDBMS world. I just never got around to it and was still running MySQL until yesterday.

I been on MySQL 5.6 for a long time and promised myself that the next upgrade was going to be to a different code base. I don’t want to feel owned by Oracle so it’ll be either PostgreSQL or MariaDB. Yesterday, I decided to make the move and, since MariaDB claims to be compatible with MySQL and is actually a fork of that code base, I decided to head in that direction.

“Claims to be compatible” is normally a fairly useless statement in the database world. When it comes to real applications, they are always deeply tied to the underlying features, implementation quirks, bugs, and performance characteristics of the underlying RDMBS.  As an example, years ago I was working on SQL Server 7.0 and many of our customers came from the previous release, version 6.5. This version only supported B+ trees whereas 7.0 has different storage options including heaps (unorder storage of rows). With 6.5 “select * from table” will produce exactly the same result as “select * from table order by primary key.” With 7.0, we implanted more storage options and an optimizer that would make cost based decision. It had the advantage of being slightly faster on transaction workloads and it was vastly faster on more complex decision support workloads. But, “select * from table” now produced results that were not necessarily ordered and the order of returned rows might change over time as the table size changed. Technically correct and completely compliant with the SQL standard but an amazing number of SQL Server 6.5 programs didn’t work on SQL Server 7.0. The lesson is a “compatible” database is similar but big applications very seldom “just run”.

I’ve seen this for decades so I fully expected that moving to MariaDB will be some considerable work even though it is a fork of the same base code and has some of the original team members working on it. So I allocated a half day and I felt like I had a good shot of getting the important features up and running by end of day.

Moving to Maria DB

Just after lunch, I installed the MariaDB install on a non-production laptop. It was installed in under a minute and it offered to either upgrade the MySQL 5.6 database this laptop or install a new instance. I went for the upgrade and, naturally, it blew up without a useful message. So, I was denied that easy way out. I installed the separate instance and played with it using the packaged console HeidiSQL. I was fairly impressed with performance and ease of use of HeidiSQL so I figured I would try to figure out why the MySQL database had failed to upgrade.

I installed the system on a different laptop, again tried to upgrade and it again failed. It fails in an ugly way where the previously running instance is left stopped and unable to run. I tried to start the service and it failed. Getting output from the service during start up, it’s easy to see it’s complaining about the use of a deprecated my.ini configuration option “Innodb_additional_mem_pool_size” likely left lying around from some ancient MySQL version. I’m not 100% in favor of “not running” as the right solution for the use of a deprecated option but I commented it out and MariaDB started up perfectly with access to all database. Pretty easy.

This hadn’t taken much time so I still have most of the afternoon to correct application problems from the upgrade if I do the production system so I decided to get on it. The production systems are all backed up nightly so I don’t really have much at risk if things fail unrecoverably.

I decided to take the lazy engineer path and just run the install, take the upgrade option, and see what fails (after comenting out “innodb_additional_mem_pool_size” from my.ini). I didn’t even both to bring down any of the production applications that were running against the MySQL system. I just ran the install, accepted the license, selected upgrade rather than new install, and it was done in under two minutes. Predictably all the applications were complaining about loss of connection and other errors. We had warning lights and error messages all over the place but, over the course of the next 60 seconds they all cleared and everything was working.

On the boat, there are three main and numerous minor applications that operate against this central database. The MariaDB install and upgrade of MySQL 5.6 to MariaDB 10.2.13 was complete in under two minutes without any application changes. I never even bothering to bring down the more than 10,000 lines of applications that were running against the instance during the upgrade and, 60 to 90 seconds after the instance was installed, the applications all reconnected, the error messages all cleared, and everything was back to normal. You can go from running in production on MySQL having never even played with MariaDB, to running in MariaDB in under 60 minutes.

There has not been a single error or issue in the day since the upgrade. Moving from MySQL to MariaDB is absolutely painless. Everything just runs in my multiple applications of more than 10,000 lines. Some queries that previously performed so poorly on MySQL that I couldn’t use them, now run fine. I’m already a huge fan of MariaDB. If you are on MySQL today and want to get away from Oracle or just want to run the latest code from the MariaDB contributors, get going. If you have a spare 30 minutes you may still have time for a coffee once you are done.

19 comments on “MySQL 5.6 to MariaDB 10.2.13
  1. Han Wang says:

    Hi James,

    Great post! When you have time, I wonder whether you can take a look at TiDB which is the next-gen of Maria/MySQL.

    TiDB is a cloud-native open-source MySQL compatible database, very cool product. 2-3 top banks (Beijing Bank, ICBC etc) have used it to replace all MySQL within the core-banking systems. Also, more than 60% top Internet companies (revenue side) in China have adopted TiDB in production environment. Very happy to share more info if you are interested :P

  2. bret says:

    So, seeing that you’re running on a small mariaDB now and were on mysql before and had to go through a quite manual process, are you eventually considering moving the database (and potentially some applications as well) to a containerized setup, like docker?

    I know that’s not the context of the article, but was curious on your thoughts on docker as I’ve moved several of my systems to docker and I’m finding it to be quite painless.

    • It’s hard to argue with less time managing servers and more time just working on the app itself so, yes, containers make a lot of sense. They are a win on most dimensions and it’s amazing how fast they are being adopted.

      Dirona’s primary server also hosts the navigation software showing charts, ship location, RADAR, depth etc. This software isn’t currently offered in containerized form so I’m not likely to move this on-boat stack to containers in the near future. But, the web site you are reading this upon could easily be moved to containers. Today its an EC2 T2.small virtual server running WordPress for the blog and Apache for other apps. It certainly would be practical to move it to containers.

      • I got super interested in phased arrays when the US Aegis missile tracking system for Cruisers and Destroyers was first put into use. This system uses a massive antenna that is a big part of the side of the ship that can be electronically steered. The size of the antenna is impressive and, since it’s electrically, steered, it is very fast. Since I’m interested in the technology, I did ask KVH about it. They are up to speed on both the military and emerging commercial applications of the technology and actually use it in a low profile antenna that combines a steerable antenna that is also a phased array. To keep the profile down, they are using a phased array but to achieve a very wide area scanned, they still use a steerable antenna.

        Where phased arrays excel is in antennas so large that steering is impractical (e.g. military Aegis system). They also excel where you need super-fast beam focusing. And, as discussed earlier, they are great when you can’t afford the space of a steerable antenna. RV applications have vertical clearance issues and commercial aircraft want a low profile to reduce drag. For a boating application, it’s hard to beat the range of a steerable antenna. My KVH V7hts can can from -25 degrees (25 degrees below horizontal) all the way up to 90 degrees (straight up) and through a full 360 degrees. It’s hard to hit that range with a stationary phased array. The phased array is far faster and this can be important when tracking a super-sonic incoming missile but, boat motion, even in really unpleasant seas, is not THAT fast.

        The steerable antenna is still a better solution than 100% non-steerable phased array. Systems that combine both technologies have some profile advantages and, over time, I suspect that as volumes grow, phased arrays will become cheaper eventually all sat systems will be at least partially phased arrays. I suspect that a phased array will allow more precise steering and may allow slightly better tracking as well.

        Generally, I think what we will see here is what we see with all new technologies. Military and niche solutions will drive use. The systems will get better and drop in cost and, over time, be used more and more and eventually will be the solution. But, this always happens slower than most experts predict. It takes time for a new technology to beat an existing, high volume tech.

        • Chasm says:

          The Inmarsat FB500 uses a flat panel antenna within the dome. Or at least the Cobham Sailor version does, it has 16 elements. Not sure how much or indeed any beam steering takes place.
          On land Kymeta and Phasor are not interested in mass applications, Alcan OTOH seems to be. If Alcan (& Sharp) really can use existing LCD lines with minor process changes to build antennas that should drive cost down very nicely.
          Since I said Inmarsat. They’ve lost the GMDSS monopoly. Iridium is now also certified by the IMO. That should increase competition nicely.

          • The Cobham Sailor 500 looks like a standard mechanically positioned antenna but I’ve not investigated it carefully. Inmarsat Fleet Broadband is a pretty dated service and super expensive. It’s been around for a long time but, at this point, it’s not a very cost effective solution.

            Love the use of LCD fabs to build phased arrays. As you said, that should bring the costs down quickly.

            Iridium manages an old Low Earth Orbit satellite fabric which has the advantage of covering the globe including the poles which isn’t possible with geosynchronous solutions. But the current Iridium network is quite slow. They have started the long process of putting up another LEO network that will be much faster but the launches for that fabric have just begun so it’ll be a many months before it’s complete.

            I’m looking forward to someone doing a dense LEO network with reliable sat hand-off offering high bandwidth connectivity anywhere in the world. Geosynchronous systems really struggle with high latitudes and especially with high latitudes in narrow fjords (we’re currently in Norway). Iridium covers the globe but it’s slow and the frequent sat hand-offs required with a LEO system frequently fail. Given that cell systems stopped dropping calls on base station hand-off decades back, you would think this would be a solvable problem.

  3. Richard Bensley says:

    Which database driver(s) are you using to interact with MariaDB?

  4. Eric says:

    Hi James I really enjoy your talks and discussions on this blog. Its been tremendously helpful. I have a basic question that i would hope you could provide an answer for. in prior writings, you had suggested that at least 50% of workload could be perhaps deep learning and/or machine learning. Say it is deep learning, and it all requires GPUs to be powered, how should one think about the associated capex given all the needs you have around redundancy, availability, and associated bandwidth and storage cost (would be disappointed to do deep learning on HDD and lesser speed optics). In other words, to unleash the full potential of these AI GPU powered servers, how should one think about the associated cost? should’t they be much more than current infra or less ? any opinion or metrics you care to share, even qualitatively would be very helpful. extremely grateful and appreciative of your time.

    • It’s an interesting question Eric but not necessarily one restricted to the changes driven by deep learning training and other often GPU-hosted workloads. This trend has been underway for a long time and is more obvious when looking at networking which was your example as well. When configuring systems, it’s very important that the most expensive components are the limiting resource. Servers are, by far, the largest cost in a data center. Networking costs tend to run down around 15%. It would be nuts to allow an expensive server to be underutilized because the network was the bottleneck. You can’t allow a ~15% cost to block utilization on a ~60% cost.

      This make perfect sense and actually the basic rule goes back to manufacturing assembly line design dating way before data centers. If your manufacturing process uses one particularly expensive machine during the process, then you want that machine fully utilized. If you can’t fully utilize that machine due to some less expensive resource being the bottleneck, it’s a bad design.

      The same is true in data centers design. You want the most expensive resource to be fully utilized. Servers cost far more than networking so any design that allows the workloads to bottleneck on networking is a poor design. As easy as this is to understand, for the decades prior to cloud computing, networking almost always was the bottleneck. This is partly because Cisco, and to a lesser extent Juniper, were very expensive, vertically integrated suppliers. Consequently, networking margins were far higher than servers and so it was closer to reasonable to bottleneck on networking resources and just about every data center during the pre-cloud era did exactly that.

      However, if you look closely at that historical example, even with crazy expensive networking equipment, bottlenecking on it was still didn’t make economic sense. It actually wasn’t the right decision at the time and three big changes have happened since:
      1) Cloud providers operate at scale and understand the economics so quickly add networking resources to avoid bottlenecking on networking and not fully utilizing the most expensive component (servers),
      2) Cloud providers have the scale and ability to do custom networking hardware designs that drop networking costs dramatically. As networking costs drop, the argument against being bottlenecked on these resources continues to be more obvious. The need to ensure that the network is not the bottleneck becomes increasingly clear as the relative cost of networking is reduce through internal development,
      3) Modern workloads are, in many cases, more networking intensive. For the most part, this is just an fact unrelated to the economic argument above. It just means that the ratio of networking resource to servers need to increase to meet the needs of these workloads without bottlenecking on networking resources which we argued above isn’t a good economic decision.

      Machine learning is the poster child workload for being network intensive and, since all the rules above continue to apply, the networking resource to server ratio will continue to escalate. This won’t change but I should make a quick note on ML training. The reason it is so network intensive is a single server is way too slow for many training jobs. If a single server can’t train fast enough, then multiple servers have to be used. Because training is a tightly connected workload, there is a lot of networking traffic in this model.

      In perhaps an obvious prediction since the process is already well underway: custom hardware or GPUs will be added to servers in large numbers with specialized, inside-the-server interconnects. For these workloads, the general purpose CPUs will become just schedulers and coordinators for large numbers of specialized ASICS and the overall training workload that a single server can handle will go up. Clearly these training workloads are growing very fast but I suspect that specialized hardware with libraries and frameworks that exploit them will allow a greater percentage of these workloads to run single server.

      This will slightly relieve the pressure to accelerate the growth of networking resources so I mention it here. However, the rule above that networking shouldn’t be the bottleneck, stays true and the network resources will continue to grow fast. In summary, the need to recover from old designs from the Cisco-era means that networking resource need to “catch up” means more networking growth. The massive growth of machine learning workloads will partly be satisfied by custom ASICs investments but not fully and, consequently, networking requirements continue to grow fast. Networking resources cost less than server resources so networking should never be the bottleneck resource.

      When all these factors are considered together, the need to increase the ratio of networking resources to server resources will continue for years to come. The move from 10G to 25G/40G happened in a fraction of the time needed for the industry to move from 1G to 10G. Network resources need to continue to accelerate partly because of new workloads like machine learning training but more conventional workloads are drivers of this change as well.

      • Eric says:

        Thanks James for the thoughtful response. when you think about what google discussed with respect to the savings they think they accomplished using TPUs vs GPU (saved them from doubling their data center traffic) and think about your forecast for doubling of AI workloads within the next several years, do you see an issue relating to the cost deploying GPU equipped data center that structurally delays some of assumptions for AI penetration? or will deep learning ASIC become very big and help lower the cost burden of building AI data centers? thank you

        • Infrastructure has in the past slowed down some specific customers for brief periods because they can’t budget the needed capital investment. But, it never seems to slow down the over all industry. For example, bitcoin mining at scale uses very power intensive systems. Inventive people find a provider with the needed power and ability cool these specialized servers. It’s still a relatively small market but at least one colo provider has decided to focus on coin mining customers.

          When new server technology makes more economic sense or solves problems not economically solved otherwise, solutions are found and deployment is never blocked by infrastructure. The machine learning workloads will be run. They usually run more efficiently on GPUs and custom ASICs so the infrastructure will have to support GPUs and custom ASIC. Power, cooling, and even networking deployments won’t slow down the spread of machine learning. I’ve seen an awfully large number of networking generations over the years. I’ve see water cooled servers deployed, I’ve seen a massive rise in server power density. Successful companies find a way to evolve their infrastructure fast enough or they depend upon partners that can offer the infrastructure they need. The same will happen here.

          This has always been the case but cloud computing allows it to happen faster and without each customer having to make the investment in upgrading their infrastructure.

  5. Ryan Jackson says:

    Having just moved away from MySQL myself for some database-backed projects, I really enjoyed reading your post. 2009-2010 were odd times for Sun (at least they seemed so to me at the time – my dad worked for them for several years, up until around when they were bought by Oracle). I hadn’t considered the impact that acquisitions of that nature could have on open-source software. Fascinating to ponder the thought, but I’m a little more glad to have made the switch away from MySQL now.

    Out of curiosity, if migrating existing data and applications were of no concern (i.e., a new project), would you still have chosen MariaDB? or would PostgreSQL have held more appeal in that case?

    • You are right that not having to change massive amounts of code was a key part of my decision to move to MariaDB. Given that PostgresSQL has now achieved critical mass in the open source community, it appears to have more flexibility, and a better optimizer, I might head in that direction where I to make a decision for a new project without any history influencing the decision.

      • Jamie W says:

        In my work, Postgres is both the only solution for some problems and a non-solution for others. We make extensive use of Puppet for both servers and desktops and it does not support MySQL/MariaDB and it seems unlikely that it ever will (not having an array type is one of the sticking points). On the other hand, when you want replicated databases none of the add-on options for Postgres seem to compare well to Galera which is integrated in the MariaDB codebase these days.

        • Thanks Jamie. My application is a fairly simple embedded database application but, even this small deployment does more than 100 requests per second and, since I store all data points on board every 5 seconds going back years, it’s now more than 70GB. Small by database standard but surprisingly big given what it is.

  6. Juan Escalante says:

    I wonder how much of the painless migration was really due to thorough compatibility and how much was due to the quality of your applications. (e.g. relying on unspecified ordering is clearly the application’s fault)

    • The biggest factor in this case is MariaDB is a fork of the MySQL code line so it started off identical. It appears to be evolving more quickly at this point and the engineers contributing clearly could break compatibility but they are wisely being careful. I agree that some database compatibility issues are really just broken apps. Apps that depend upon broken or not guaranteed to be maintained results is a big problem in the database world. Hopefully, I’m less guilty than the worst app writers out there but, still, I’m pretty impressed with how easy it was to move more than 10,000 lines of code written over several years from MySQL to MariaDB.

Leave a Reply

Your email address will not be published. Required fields are marked *