In Microslice Servers and the Case for Low-Cost, Low-Power Servers, I observed that CPU bandwidth is outstripping memory bandwidth. Server designers can address this by: 1) designing better memory subsystems or 2) reducing the CPU per-server. Optimizing for work done per dollar and work done per joule argues strongly for the second approach for many workloads.
In Low Power Amdahl Blades for Data Intensive Computing (Amdahl Blades-V3.pdf (84.25 KB)), Alex Szalay makes a related observation and arrives at a similar point. He argues that server I/O requirements for data intensive computing clusters grow in proportion to CPU performance. As per-server CPU performance continues to increase, we need to add additional I/O capability to each server. We can add more disks but this drives up both power and cost as more disk require more I/O channels. Another approach is use a generation 2 flash SSDs such as the Intel X25-E or the OCZ Vertex (I’m told the Samsung 2.06Gb/s (SLC) is also excellent but I’ve not yet seen their random write IOPS rates). Both the OCZ and the Intel components are excellent performers nearing FusionIO but at a far better price point making them considerably superior in work done per dollar.
The Szalay paper looks first at the conventional approach of adding flash SSDs to a high-end server. To get the required I/O rates, three high-performance SSDs would be needed. But, to get full I/O rates from the three devices, three I/O channels would be needed which drives up power and cost. What if we head the other way and, rather than scaling up the I/O sub-system, we scale down the CPU per server? Alex shows that a low-power, low-cost commodity board coupled with a single, high-performance flash SSDs would form an excellent building block for a data intensive cluster. It’s a very similar direction to CEMS servers but applied to data intensive workloads.
One of the challenges of low-power, high-density servers along the lines proposed by Alex and I is network cabling. With CEMS there are 240 servers/rack and a single top-of-rack switch is inadequate so we go with a mini-switch per six-server tray and each of 40 trays connected to a top-of-rack switch. The Low Power Amdahl Blades are yet again more dense. Alex makes a more radical approach proposal to interconnect the rack using very short-range radio. From the paper,
Considering their compact size and low heat dissipation, one can imagine building clusters of thousands of low-power Amdahl blades. In turn, this high density will create challenges related to interconnecting these blades using existing communication technologies (i.e., Ethernet, complex wiring if we have 10,000 nodes). On the other hand, current and upcoming high-speed wireless communications offer an intriguing alternative to wired networks. Specifically, current wireless USB radios (and their WLP IP-based variants) offer point-to-point speeds of up to 480 Mbps over small distances (~3-10 meters). Further into the future, 60 GHz-based radios promise to offer Gbps of wireless bandwidth.
I’m still a bit skeptical that we can get rack-level radio networking to be win in work done per dollar and work done per joule but it is intriguing and I’m looking forward to getting into more detail on this approach with Alex.
Remember it’s work done per dollar and work done per joule that we should be chasing. And, in optimizing for these metrics, we increasingly face challenges of insufficient I/O and memory bandwidth per core. Both CEMS and Low-Power Amdahl Blades address the system balance issue by applying more low-power servers rather than adding more I/O and memory bandwidth to each server.
It’s the performance of the aggregate cluster we care about and work done dollar and work done per joule is the final arbiter.
James Hamilton, Amazon Web Services
1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | firstname.lastname@example.org
H:mvdirona.com | W:mvdirona.com/jrh/work | blog:http://perspectives.mvdirona.com