At Tuesday Night Live with James Hamilton at the 2016 AWS re:Invent conference, I introduced the first Amazon Web Services custom silicon. The ASIC I showed formed the foundational core of our second generation custom network interface controllers and, even back in 2016, there was at least one of these ASICs going into every new server in the AWS fleet. This work has continued for many years now and this part and subsequent generations form the hardware basis of the AWS Nitro System. The Nitro system is used to deliver these features for AWS Elastic Compute Cluster (EC2) instance types:
- High speed networking with hardware offload
- High speed EBS storage with hardware offload
- NVMe local storage
- Remote Direct Memory Access (RDMA) for MPI and Libfabric
- Hardware protection/firmware verification for bare metal instances
- All business logic needed to control EC2 instances
We continue to consume millions of the Nitro ASICs every year so, even though it’s only used by AWS, it’s actually a fairly high volume server component. This and follow-on technology has been supporting much of the innovation going on in EC2 but haven’t had a chance to get into much detail on how Nitro actually works.
At re:Invent 2018 Anthony Liguori, one of the lead engineers on the AWS Nitro System project gave what was, at least for me, one of the best talks at re:Invent outside of the keynotes. It’s worth watching the video (URL below) but I’ll cover some of what Anthony went through in his talk here.
The Nitro System powers all EC2 Instance types over the last couple of years. There are three major components:
- Nitro Card I/O Acceleration
- Nitro Security Chip
- Nitro Hypervisor
Different EC2 server instance types include different Nitro System features and some server types have many Nitro System cards that implement the five main features of the AWS Nitro System:
- Nitro Card for VPC (Virtual Private Cloud)
- Nitro Card for EBS (Elastic Block Store)
- Nitro Card for Instance Storage
- Nitro Card Controller
- Nitro Security Chip
These features formed the backbone for Anthony Liguori’s 2018 re:Invent talk and he went through some of the characteristics of each.
Nitro Card for VPC
The Nitro card for VPC is essentially a PCIe attached Network Interface Card (NIC) often called a network adapter or, in some parts of the industry, a network controller. This is the card that implements the hardware interface between EC2 servers and the network connection or connections implemented on that server type. And, like all NICs, interfacing with it requires that there be a specific device driver loaded to support communicating with the network adapter. In the case of AWS NICs, the Elastic Network Adapter (ENA) is the device driver support for our NICs. This driver is now included in all major operating systems and distributions.
The Nitro Card for VPC supports network packet encapsulation/decapsulation, implements EC2 security groups, enforces limits, and is responsible for routing. Having these features implemented off of the server hardware rather than in the hypervisor allows customers to fully use the underlying server hardware without impacting network performance, impacting other users, and we don’t have to have some server cores unavailable to customers to handle networking tasks. And, it also allows secure networking support without requiring server resources to be reserved for AWS use. The largest instance types get access to all server cores.
It wasn’t covered in the talk but the Nitro Card for VPC also supports a number of network acceleration features. The Elastic Fabric Adapter (EFA) uses the Nitro Card network acceleration features to provide user space networking capabilities similar to those found on many supercomputers. Customers wishing to take advantage of EFA can use the OpenFabrics Alliance Libfabric package or use a higher level programming interface like the popular Message Passing Interface (MPI) or NVIDIA Collective Communications Library (NCCL). Whether using Libfabric, MPI, or NCCL, applications bypass the operating system when talking to EFA, and are able to achieve more consistent performance with lower CPU utilization. MPI and NCCL are commonly used package in science, engineering, and machine learning applications and, to a lesser extent, distributed databases.
Nitro Card for EBS
The Nitro Card for EBS supports storage acceleration for EBS. All instance local storage is implemented as NVMe devices and the Nitro Card for EBS supports transparent encryption, limits to protect the performance characteristics of the system for other users, drive monitoring to monitor SSD wear, and it also supports bare metal instance types.
Remote storage is again implemented as NVMe devices but this time as NVMe over Fabrics supporting access to EBS volumes again with encryption and again without impacting other EC2 users and with security even in a bare metal environment.
The Nitro card for EBS was first launched in the EC2 C4 instance family.
Nitro Card for Instance Storage
The Nitro Card for Instance storage also implements NVMe (Non-Volatile Memory for PCIe) for local EC2 instance storage.
Nitro Card Controller
The Nitro Card Controller coordinates all other Nitro cards, the server hypervisor, and the Nitro Security Chip. It implements the hardware root of trust using the Nitro Security Chip and supports instance monitoring functions. It also implements the NVMe controller functionality for one or more Nitro Cards for EBS.
Nitro Security Chip
The Nitro security chip traps all I/O to non-volatile storage including BIOS and all I/O device firmware and any other controller firmware on the server. This is a simple approach to security where the general purpose processor is simply unable to change any firmware or device configuration. Rather than accept the error prone and complex task of ensuring access is approved and correct, no access is allowed. EC2 servers can’t update their firmware. This is GREAT from a security perspective but the obvious question is how is the firmware updated. It’s updated using by AWS and AWS only through the Nitro System.
The Nitro Security Chip also implements the hardware root of trust. This system replaces 10s of millions of lines of code that for the Unified Extensible Firmware Interface (UEFI) and supports secure boot. In starts the server up untrusted, then measures every firmware system on the server to ensure that none have been modified or changed in any unauthorized way. Each checksum (device measure) is checked against the verified correct checksum stored in the Nitro Security Chip.
The Nitro System supports key network, server, security, firmware patching, and monitoring functions freeing up the entire underlying server for customer use. This allows EC2 instances to have access to all cores – none need to be reserved for storage or network I/O. This both gives more resources over to our largest instance types for customer use – we don’t need to reserve resource for housekeeping, monitoring, security, network I/O, or storage. The Nitro System also makes possible the use of a very simple, light weight hypervisor that is just about always quiescent and it allows us to securely support bare metal instance types.
More data on the AWS Nitro System from Anthony Liguori, one of the lead engineers behind the software systems that make up the AWS Nitro System:
- Anthony’s video: Powering Next-Gen EC2 Instances: Deep-Dive into the Nitro System
Three Keynotes for a fast past view for what’s new across all of AWS:
Does Nitro Card for VPC support PTP sync from external source like BC (Boundary Clock)? It should support if it’s conventional PCIe NIC card-like unless AWS disabled.
Good question. We do support clock sync (https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-time-sync-service/) but not yet at IEEE 1588 precision. High precision clock sync is super useful for many distributed algorithms and we expect to offer customers increased precision support.
Thanks James. It will be demanding for real time workload processing in RAN domains (I assume metal instances should be the same because it is Nitro card capability). And I don’t see AWS technical restriction to unleash it but it’s up to how to respond to market demand whilst Azure HCI has offered it just like other baremetal infrastructure features.
Again, hoping to see its capability soon.
We’re doing everything but RAN with Dish Networks (https://aws.amazon.com/blogs/industries/telco-meets-aws-cloud-deploying-dishs-5g-network-in-aws-cloud/) and, as you said, have the capability to RAN components as well.
It’s set an great example and taken initiative through AWS. I’ve read months ago but it’s one of best practice text books in 5G era. BTW it’s good to know you have the T-BC capability. But is it enabled conditionally upon request or in-progress development? If ready, it will expand to wider application types.
Does the encryption in Nitro System use any of the FIPS validated modules?
The AWS FIPS 140-2 is described here: https://aws.amazon.com/compliance/fips/.
Thank you James! That list has EC2 listed as a whole. I assume that includes Nitro system.
Hi ，james，I am wondering how does nitro controller communicate with ENA，EBS card？
It’s not visible externally and we generally only publish data that helps customers. That’s data point isn’t out there.
Hello James, with the announcement of AWS Outposts I dived a little in the Nitro Systems and found “some” similiarities with IBM “mainframe” internal architecture.
It seems AWS is re inventing the “mainframe” but in a distributed vision.
But I may be wrong
That’s hasn’t been the way we look at it but, yes, you are correct that there is some similarity. Having a control processor separate from customer workloads to monitor, configure, and control has long been a key component of IBM Mainframes and Nitro does fill those functions in addition to other unique Nitro functions.
Just a reminder, for example, on the z14 IBM enterprise system (not the very last one), customers can order up to 170 processors for applications processing, but the design of the z14 can include up to 332 power cores for IO and co processors, and up to 322 RAS cores. IBM has been also using SmartNIC for networking. IBM is now going distributed even with its “mainframe” systems. FYI, I was an IBMer but I have quit a long long time ago, for a more distributed IT world . I continue to survey and study IT trends and offerings, including the AWS ones which become very rich now. My last focus was on Database offerings and technologies for a large customer and as you may guess, I did read insights from Mr Jim GRAY.
Thank you also for your very appreciated insights too.
Yes, if you don’t focus on price/performance, IBM mainframes are impressive from an engineering perspective. I’m also a big fan of the EMC DMX-3000 storage array. Fully configured, that storage system has 100 PowerPCs. Rather than sampling, 100% of the system boards are tested in environmental chambers under high vibration with widely varying voltage levels. Boards that pass are integrated into the customer ordered rack configuration and then the final up to 3 rack system is tested in an environmental chamber before loading onto a truck.
If you have to have a single hardware component and it must be as reliable as possible, these mainframe class systems are impressive. Ironically far less reliable servers running redundantly in the cloud across three different but nearby data centers is more reliable a single data center mainframe. More reliable, far less expensive, and for most of us, easier to program.
any idea if nitro hypervisor is capable of memory overcommitment when VMs have directly assigned I/O devices?
It’s hard to oversubscribe memory without negatively impacting the latency. It won’t hurt average latency all that much but it will drive up jitter and the customer experience suffers so, on leading AWS instance types like C, M, R, etc. we don’t over-subscribe memory. But, yes, you could certainly do this. Any memory that was registered as a receive buffer would have to be pinned, the handlers have to stay memory resident, but, sure, it could be done.
Is there a host OS running with nitro hypervisor? or just a nitro hypervisor with small userspace, but without kernel ?
The Nitro hypervisor is built on a minimized and modified Linux kernel, including the KVM subsystem that is responsible for programming hardware virtualization features of the processor. It is not a general purpose operating system kernel in our architecture. The Linux kernel portion of the Nitro hypervisor has no support for networking of any kind, or anything else that is not needed for the Nitro hypervisor’s sole purposes: partitioning CPU and memory, assigning Nitro card virtual functions to instances, monitoring, metrics, and hardware error handling.
The Nitro hypervisor is a simplified kernel to accomplish only the partitioning and card assignment. Do you think it is possible and meaningful to merge these functions to Firmware, such as Linuxboot or UEFI?
As known, some mission critical servers provide the “logical partition firmware”, such as PowerVM and KunLun 9008v5.
The word firmware is usually applied to software that is directly installed on a device in persistent memory and it’s typically both “close to the hardware”, fairly small, and without an operating system. But, it’s still just software. What you are asking is does it make sense to have hardware assist for the support of virtual machines. Yes, absolutely. Modern processors are full of it and our Nitro program is a good example of using extensive hardware support. Good software can get the latency pretty good but it takes hardware to get the overheads down to the minimum and, more importantly, to reduce the jitter. Hardware support is the easiest way to get small and consistent virtualization overhead.
can Nitro card help with encryption and decryption at HW level? ex> Sending and receiving IPSEC
Nitro does hardware acceleration for network and storage encryption but IPSEC terminated at the instances is encrypted/encrypted on instance without nitro support.
Hi James as always great blog. Curious whether you can discuss some of the efficiencies you are able to extract on compute and networking (or from storage that alleviate networking/compute bottlenecks)? Is there anything specific you are doing different vs. your peers around software or AI or machine learning that allows for significant improvement in capex intensity on the aforementioned? Would these measure help run things hotter or more efficient even as traffic growth remains robust inside your data centers? Any specific examples on compute (cpu utilization, kind of cpu like Annapurna) and networking (around data center switching/routing)? Your input would be greatly appreciated and thank you as always
Lots we don’t talk about but a few examples that are pretty useful and related to Nitro. 1) Our overall cost of networking has plummeted over the last 10 years and as a consequence we can make more network resources available. We now offer several instance types with 100Gbps networking and plan to continue to expand this set of offerings. Annapurna and Nitro make it possible. 2) With Nitro hardware offload, we can run very high networking and storage bandwidth (and high requests per second) without reserving CPU cores for internal processing or running over-subscribed where the customer might get the resources or might not based upon how busy networking and storage are. The hardware offload makes all cores available to the customer and they don’t have to compete with infrastructure for their resources. They don’t suffer TLB resource contention or only have resources available when the network or storage isn’t busy, and 3) storage off load. Hardware assist for storage offloads the customers general purpose processor, allows encryption without impact, and reduces the high 9 variability of storage request times.
Hi James! If the overal cost of networking has plummeted in the last decade, why are AWS’ networking charges still so high?
Yeah, yeah, I have heard that before :-) On your question of what’s happened in networking in the last decade, there has been quite a bit. We have improved the bandwidth by a couple of orders of magnitude which isn’t bad. Jitter has improved by an even larger margin. Availability is up dramatically over this decade. The richness of networking support is wildly higher with direct connect, VPC, load balancing, support for bare metal, O/S bypass, and a host of other improvements. But, I do hear you and most of our focus on and around networking has been providing “more” and “better” rather than dropping costs. The reasoning is that networking is a small part of most customers bills and most customers want the network out of the way and want more networking capability rather than having us treat it like a commodity and focus all of our attention on cost reduction. But I do know there are workloads where networking costs are big or even dominate and costs do matter.
We generally have a pretty good track record of finding cost reductions and passing them along. We have good results with S3 price reductions over the last decade, amazing results in database overall costs, good results in compute, and I’m proud of some of what we have done in networking but I hear you on requesting more of our networking investment show up as cost reductions.
Any plans to support peer to peer DMA between NVME / Network devices via Nitro?
There is nothing on the short term horizon but thanks for the suggestions.
You mentioned that Nitro card controller also implements the NVMe controller functionality for one or more Nitro Cards for EBS. I am curious why not make Nitro Card for EBS itself as a NVMe controller but instead of going through Nitro Card controller. Or I guess Nitro Card controller is just on the control plane but all NVMe traffic still going through Nitro card for EBS. Thanks a lot.
I’m pretty sure you are correct that it’s just the control plane.
Good to see Nitro in action. Question re: EBS impact, is there any document / presentation on what to expect for EBS performance with the new Nitro cards / hypervisor.
No, not to my knowledge at this point.
For the Nitro Network adapter, no matter for VPC or EBS, the network traffic will be 25Gb based, the PCIE Gen3X8 /16 can handle with any problem. But for the local instant storage, single NVMe SSD will be Gen3X4, so one Nitro only can up to 4 NVMe SSD, the standard server can support more than 4 NVMe SSD, how Nitro deal with this ? Add one more Nitro chip for another 4 NVMe SSD ?
If more resources are needed, additional Nitro cards can be used and several AWS instance types use more than a single card.
Thanks for your clarification.
Fascinating piece of technology and one of the best examples of HW positioned right and making a huge difference!
The great James Hamilton: Hi!
Re: Nitro for VPC: EFA was mentioned in conjunction with SRD at re:invent by another talk in the HPC track, is that transport protocol hardware accelerated as well?
What’s next for Nitro in the context of not just EC2s and it’s networking infrastructure: I’m wondering if application layer specific things (like required for CDNs for media, esp imgs and video, where not just latency and throughput but enconding is paramount, or even deep packet inspection for DDoS protection)?
You might also want to link to your excellent blog post on hw-acceleration for nw from way back when: https://perspectives.mvdirona.com/2009/12/networking-the-last-bastion-of-mainframe-computing/
Also, I hope you resume doing TNLs at AWS re:invent.
Yes, you are correct. The SRD protocol is also supported by Nitro for VPC. Thanks for suggesting the link.
This was one of my favorite talks at re:Invent last year, here’s a link to my visual notes on Anthony’s CMP303 session: https://www.awsgeek.com/posts/AWS-reInvent-2018-CMP303-Powering-Next-Gen-EC2-Instances-Deep-Dive-into-the-Nitro-System/
I totally agree. This was some of the most detailed and most interesting material we presented at last years conference. Thanks for the pointer to notes.
Hi James, minor typo in the “Nitro Card for VPC”: “Elastic Network Adapter (EFA)” should read “Elastic Network Adapter (ENA)”