Xen-on-Nitro: AWS Nitro for Legacy Instances

On August 25, 2006, we started the public beta of our first ever EC2 instance. Back then, it didn’t even have a name yet, but we latter dubbed it “m1.small.”. Our first customers were able to use the equivalent of 1.7 GHz Xeon processor, 1.75 GB of RAM, 160 GB of local disk and 250 Mb/second of network bandwidth for just 10 cents an hour whenever they needed it and for as long as they needed it. Under the hood, we used the Xen hypervisor to provide these virtual machines or instances as we call them. Customer adoption exceeded our wildest expectations and over the next 10 years, EC2 launched 27 more instance types based on the Xen technology.

Over the years, many successful businesses where built on top of EC2 and we got a lot of valuable feedback that helped us to evolve and improve our virtualization technology and our instance types. However, there were a few things that we just could not address with only incremental software improvements.

These early instances were using virtualized or emulated devices that abstracted the instance from the underlying hardware. It worked very well, but the overhead kept growing as networking and storage speeds increased. We needed to reserve multiple CPU cores from the physical server to emulate these storage and networking devices. That was computing power that we weren’t able to offer to our customers which particularly impacted our largest instance types. If you are only using a couple of cores it’s not much of a problem, but if you are running an HPC or data intensive workload that required as many cores as we can offer, those cores lost to overhead network and storage emulation really make a difference. Our customers wanted a solution that would securely support them using every core fully and an important subset of customers needed bare metal support (no hypervisor) which we would only offer if fully secured.

On November 6, 2017, we announced AWS Nitro System, the backbone for our new virtualization technology. Nitro is the EC2 hardware offload technology we developed to support high performance networking with hardware offload and optional O/S bypass, low latency storage with hardware offload, NVMe local storage, and more advanced security features. See AWS Nitro System for more detail on the Nitro system.

C5 instances where the first EC2 instance types fully supporting Nitro and, since then, we have launched 45 more instances based on the Nitro technology. Besides the vastly improved storage and networking performance made possible through hardware offload, the Nitro technology also allowed us to even further increase the security of our virtualization technology.

Older AMIs do not provide the NVMe drivers and rely on Xen specific device naming incompatible with the newer Nitro-based instance types. Many customers prefer to not invest in upgrading their operating system as they are fine with the performance of their older Xen-based instance types. Today, we still have over 1.2 Million unique customers using Xen-based instances. We’re proud to still be offering and investing in our original instance types rather than forcing customers to move to newer instance types that are easier for us to service.

However, the underlying hardware is old and it’s getting increasingly difficult to maintain support for these older hypervisor systems. To innovate on behalf of our customers, the EC2 engineering team looked for solution to continue to provide Xen-based instance types by expanding the AWS Nitro Systems. In particular, they had to overcome the following challenges:

  • Para-virtual Devices – Xen instances leverage para-virtual (PV) devices to obtain network and block storage access. These devices were not implemented in the Nitro environment and support had to be added to enable legacy instance type storage and network interfaces to work without changes to the customer workloads or AMIs.
  • Hypervisor Interfaces – Para-virtual, or PV, devices are software constructs that rely on special hypervisor interfaces. Those interfaces are accessed either by hypercalls or memory pages that are shared between the guest OS and the hypervsior. To support these PV devices in Nitro it was necessary to provide these Xen interfaces, e.g. event channels, grant tables, xenstore, etc. We had to ensure that any interactions between the instance and the Nitro hypervisor worked exactly as they worked with the Xen hypervisor. The guest OS must see no difference in the way it interacts with the underlying virtualized HW in order to make the transition to Nitro completely transparent to customers. Fortunately, related work in this area had already been done by the Linux community and we were able to leverage that in the engineering we did to fully support Nitro.
  • Virtual Hardware environment – Over the years, hardware technology has evolved and new servers differ in many areas. CPUs have additional instructions and much larger caches. To ensure that older AMIs work well, we also had to faithfully emulate the old hardware. Emulation of hardware is one of the key features of our Nitro chips. Together with modifications of the Nitro hypervisor, we were able to provide a hardware environment to instances that looks exactly like older generation hardware. CPU instruction sets, registers, network and I/O devices, and message passing interfaces show no differences if an older instance type is launched on Nitro hardware.
  • Hardware accelerators – Some instance types like G2, G3, P2, P3 and F1 use GPUs or FPGAs. These accelerators expose the full architectural details to the instance and are not virtualized. Emulating these complex architectures without hardware supported virtualization doesn’t perform well so we don’t currently do it. Consequently, these legacy instance types depending up on GPUs and FPGAs, will not be supported by the Nitro system.

All of these innovations enable us to continue to offer many of our older instance types well past the lifetime of the original hardware. Starting in 2022, customers launching M1, M2, M3, C1, C3, R3, I2 and T1 instances will land on Nitro supported instances hardware and existing running instances will also be migrated. Whether the instance runs on the original hardware or newer Nitro hardware will be fully transparent to customers. Workloads will continue to run just like they have run before. Later we will also support C4, M4, R4 and T2 instance types.

We did this work because we wanted to provide customers the ability to continue to run their legacy workloads unchanged allowing them to focus their valuable engineering resources moving their respective businesses forward rather than migrating between instance types. It would certainly have been easier for us to just send out a retirement notice for these legacy instance types but, at AWS, we work hard to avoid unnecessarily taxing customers. Xen-on-Nitro is us walking the extra mile to ensure customers can completely focus on innovation and making their businesses successful rather than migrating to newer instance types.

6 comments on “Xen-on-Nitro: AWS Nitro for Legacy Instances
  1. Andrew B Cencini says:

    Really neat solution; I also like the notion of ‘innovating on behalf of customers’, that’s a good way of putting it!

    • Hey Andrew. Yeah, I agree. The team came up with a pretty sweet solution to a long standing issue where companies IT investments end up driven by enterprise IT suppliers rather than the needs of their respective businesses.

  2. Ronan Kelehan says:

    Interesting approach. Newer instances are much cheaper as well as faster. The customers who haven’t upgraded already probably really don’t want to or can’t. I’ve had a fair few of those conversations myself so I know how unhappy they can get.

    If there’s 1.2m of them it makes a lot of sense to invest not to break things.

    In theory this should also be more secure, right?

  3. Stu Miniman says:

    James,
    This is really interesting. I wonder if as an industry we make it too easy for companies to avoid change in their applications. 20 years ago VMware proliferated x86 virtualization which decoupled the application for hardware or OS EOL. Creaky old applications that should have been sent to the wood chipper when the metal it was installed on died, or all of those Windows NT apps that were already Jurassic-era, continued to run for years. The ultimate end users of the applications suffers, it’s a gap between the application and infrastructure. It is a slow journey to modernize, and options are good, but with so many other paths to SaaS, containers, serverless, and the cloud ecosystem helping, how do we avoid having apps that are past expiration be retired?

    • Hey Stu. Good to hear from you and, yes, I agree there are some very old apps out there. It’s up to us as technology providers to make it really easy to move apps forward. We need to provide the tools to support cost effective application modernization. But it’s not our role to decide when customers move their application stacks forward. A company severely impacted by Covid with their business in trouble due to supply chain shortages or the massive negative impacts that Covid has brought to some industries, should never get a note from us saying they have to rewrite their application stack during a crisis. Other companies could have a unique opportunity where they have a just released a new product that is seeing surprising success. Their scrambling to learn from their early customers and capitalize on the success they are seeing in the market. The last thing they would want to see is a note from their cloud supplier saying we’re retiring the instance types they are using.

      Our job is to make it easy for customers to do the right thing and never force them to do something that doesn’t work for the unique conditions they are operating under.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.