High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Today, computer systems approaching the teraflops-region are counted as HPC-computers. The term is most commonly associated with computing used for scientific research or computational science. A related term, high-performance technical computing (HPTC), generally refers to the engineering applications of cluster-based computing (such as computational fluid dynamics and the building and testing of virtual prototypes). Recently, HPC has come to be applied to business uses of cluster-based supercomputers, such as data warehouses, line-of-business (LOB) applications, and transaction processing.
Predictably, I use the broadest definition of HPC including data intensive computing and all forms of computational science. It still includes the old stalwart applications of weather modeling and weapons research but the broader definition takes HPC from a niche market to being a big part of the future of server-side computing. Multi-thousand node clusters, operating at teraflop rates, running simulations over massive data sets is how petroleum exploration is done, it’s how advanced financial instruments are (partly) understood, it’s how brick and mortar retailers do shelf space layout and optimize their logistics chains, it’s how automobile manufacturers design safer cars through crash simulation, it’s how semiconductor designs are simulated, it’s how aircraft engines are engineered to be more fuel efficient, and it’s how credit card companies measure fraud risk. Today, at the core of any well run business, is a massive data store –they all have that. The measure of a truly advanced company is the depth of analysis, simulation, and modeling run against this data store. HPC workloads are incredibly important today and the market segment is growing very quickly driven by the plunging cost of computing and the business value understanding large data sets deeply.
High Performance Computing is one of those important workloads that many argue can’t move to the cloud. Interestingly, HPC has had a long history of supposedly not being able to make a transition and then, subsequently, making that transition faster than even the most optimistic would have guessed possible. In the early days of HPC, most of the workloads were run on supercomputers. These are purpose built, scale-up servers made famous by Control Data Corporation and later by Cray Research with the Cray 1 broadly covered in the popular press. At that time, many argued that slow processors and poor performing interconnects would prevent computational clusters from ever being relevant for these workloads. Today more than ¾ of the fastest HPC systems in the world are based upon commodity compute clusters.
The HPC community uses the Top-500 list as the tracking mechanism for the fastest systems in the world. The goal of the Top-500 is to provide a scale and performance metric for a given HPC system. Like all benchmarks, it is a good thing in that it removes some of the manufacturer hype but benchmarks always fail to fully characterize all workloads. They abstract performance to a single or small set of metrics which is useful but this summary data can’t faithfully represent all possible workloads. Nonetheless, in many communities including HPC and Relational Database Management Systems, benchmarks have become quite important. The HPC world uses the Top-500 list which depends upon LINPACK as the benchmark.
Looking at the most recent Top-500 list published in June 2010, we see that Intel processors now dominate the list with 81.6% of the entries. It is very clear that the HPC move to commodity clusters has happened. The move that “couldn’t happen” is near complete and the vast majority of very high scale HPC systems are now based upon commodity processors.
What about HPC in the cloud, the next “it can’t happen” for HPC? In many respects, HPC workloads are a natural for the cloud in that they are incredibly high scale and consume vast machine resources. Some HPC workloads are incredibly spiky with mammoth clusters being needed for only short periods of time. For example semiconductor design simulation workloads are incredibly computationally intensive and need to be run at high-scale but only during some phases of the design cycle. Having more resources to throw at the problem can get a design completed more quickly and possibly allow just one more verification run to potentially save millions by avoiding a design flaw. Using cloud resources, this massive fleet of servers can change size over the course of the project or be freed up when they are no longer productively needed. Cloud computing is ideal for these workloads.
Other HPC uses tend to be more steady state and yet these workloads still gain real economic advantage from the economies of extreme scale available in the cloud. See Cloud Computing Economies of Scale (talk, video) for more detail.
When I dig deeper into “steady state HPC workloads”, I often learn they are steady state as an existing constraint rather than by the fundamental nature of the work. Is there ever value in running one more simulation or one more modeling run a day? If someone on the team got a good idea or had a new approach to the problem, would it be worth being able to test that theory on real data without interrupting the production runs? More resources, if not accompanied by additional capital expense or long term utilization commitment, are often valuable even for what we typically call steady state workloads. For example, I’m guessing BP, as it battles the Gulf of Mexico oil spill, is running more oil well simulations and tidal flow analysis jobs than originally called for in their 2010 server capacity plan.
No workload is flat and unchanging. It’s just a product of a highly constrained model that can’t adapt quickly to changing workload quantities. It’s a model from the past.
There is no question there is value to being able to run HPC workloads in the cloud. What makes many folks view HPC as non-cloud hostable is these workloads need high performance, direct access to underlying server hardware without the overhead of the virtualization common in most cloud computing offerings and many of these applications need very high bandwidth, low latency networking. A big step towards this goal was made earlier today when Amazon Web Services announced the EC2 Cluster Compute Instance type.
The cc1.4xlarge instance specification:
· 23GB of 1333MHz DDR3 Registered ECC
· 64GB/s main memory bandwidth
· 2 x Intel Xeon X5570 (quad-core Nehalem)
· 2 x 845GB 7200RPM HDDs
· 10Gbps Ethernet Network Interface
It’s this last point that I’m particularly excited about. The difference between just a bunch of servers in the cloud and a high performance cluster is the network. Bringing 10GigE direct to the host isn’t that common in the cloud but it’s not particularly remarkable. What is more noteworthy is it is a full bisection bandwidth network within the cluster. It is common industry practice to statistically multiplex network traffic over an expensive network core with far less than full bisection bandwidth. Essentially, a gamble is made that not all servers in the cluster will transmit at full interface speed at the same time. For many workloads this actually is a good bet and one that can be safely made. For HPC workloads and other data intensive applications like Hadoop, it’s a poor assumption and leads to vast wasted compute resources waiting on a poor performing network.
Why provide less than full bisection bandwidth? Basically, it’s a cost problem. Because networking gear is still building on a mainframe design point, it’s incredibly expensive. As a consequence, these precious resources need to be very carefully managed and over-subscription levels of 60 to 1 or even over 100 to 1 are common. See Datacenter Networks are in my Way for more on this theme.
For me, the most interesting aspect of the newly announced Cluster Compute instance type is not the instance at all. It’s the network. These servers are on a full bisection bandwidth cluster network. All hosts in a cluster can communicate with other nodes in the cluster at the full capacity of the 10Gbps fabric at the same time without blocking. Clearly not all can communicate with a single member of the fleet at the same time but the network can support all members of the cluster communicating at full bandwidth in unison. It’s a sweet network and it’s the network that makes this a truly interesting HPC solution.
Each Cluster Compute Instance is $1.60 per instance hour. It’s now possible to access millions of dollars of servers connected by a high performance, full bisection bandwidth network inexpensively. An hour with a 1,000 node high performance cluster for $1,600. Amazing.
As a test of the instance type and network prior to going into beta Matt Klein, one of the HPC team engineers, cranked up LINPACK using an 880 server sub-cluster. It’s a good test in that it stresses the network and yields a comparative performance metric. I’m not sure what Matt expected when he started the run but the result he got just about knocked me off my chair when he sent it to me last Sunday. Matt’s experiment yielded a booming 41.82 TFlop Top-500 run.
For those of you as excited as I am interested in the details from the Top-500 LINPACK run:
- In: Amazon_EC2_Cluster_Compute_Instances_Top500_hpccinf.txt
- Out: Amazon_EC2_Cluster_Compute_Instances_Top500_hpccoutf.txt
- The announcement: Announcing Cluster Compute Instances for EC2
Welcome to the cloud, HPC!