The server tax is what I call the mark-up applied to servers, enterprise storage, and high scale networking gear. Client equipment is sold in much higher volumes with more competition and, as a consequence, is priced far more competitively. Server gear, even when using many of the same components as client systems, comes at a significantly higher price. Volumes are lower, competition is less, and there are often many lock-in features that help maintain the server tax. For example, server memory subsystems support Error Correcting Code (ECC) whereas most client systems do not. Ironically both are subject to many of the same memory faults and the cost of data corruption in a client before the data is sent to a server isn’t obviously less than the cost of that same data element being corrupted on the server. Nonetheless, server components typically have ECC while commodity client systems usually do not.

Back in 1987 Garth Gibson, Dave Patterson, and Randy Katz invented Redundant Array of Inexpensive Disks (RAID). Their key observation was that commodity disks in aggregate could be more reliable than very large, enterprise class proprietary disks. Essentially they showed that you didn’t have to pay the server tax to achieve very reliable storage. Over the years, the “inexpensive” component of RAID was rewritten by creative marketing teams as “independent” and high scale RAID arrays are back to being incredibly expensive. Large Storage Area Networks (SANs) are essentially RAID arrays of “enterprise” class disk, lots of CPU and huge amounts of cache memory with a fiber channel attach. The enterprise tax is back with a vengeance and an EMC NS-960 prices in at $2,800 a terabyte.

BackBlaze, a client compute backup company, just took another very innovative swipe at destroying the server tax on storage. Their work shows how to bring the “inexpensive” back to RAID storage arrays and delivers storage at $81/TB. Many services are building secret, storage subsystems that deliver super reliable storage at very low cost. What makes the BackBlaze work unique is they have published the details on how they built the equipment. It’s really very nice engineering.

In Petabytes on a budget: How to Build Cheap Cloud Storage they outline the details of the storage pod:

· 1 storage pod per 4U of standard rack space

· 1 $365 mother board and 4GB of ram per storage pod

· 2 non-redundant Power Supplies

· 4 SATA cards

· Case with 6 fans

· Boot drive

· 9 backplane multipliers

· 45 1.5 TB commodity hard drives at $120 each.

Each storage pod runs Apache TomCat 5.5 on Debian Linux and implements 3 RAID6 volumes of 15 drives each. They provide a hardware full bill of materials in Appendix A of Petabytes on a budget: How to Build Cheap Cloud Storage.

Predictably some have criticized the design as inappropriate for many workloads and they are right. The I/O bandwidth is low so this storage pod would be a poor choice for data intensive applications like OLTP databases. But, it’s amazingly good for cold storage like the BackBlaze backup application. Some folks have pointed out that the power supplies are very inefficient at around 80% peak efficiency and the configuration chosen will have them far below peak efficiency. True again but it wouldn’t be hard to replace these two PSUs with a single, 90+% efficiency, commodity unit. Many are concerned with cooling and vibration. I doubt cooling is an issue and, in the blog posting, they addressed the vibration issue and talked briefly about how they isolated the drives. The technique they chose might not be adequate for high IOPS arrays but it seems to be working for their workload. Some are concerned by the lack of serviceability in that the drives are not hot swappable and the entire 67TB storage pod has to be brought offline to do drive replacements. Again, this concern is legitimate but I’m actually not a big fan of hot swapping drives – I always recommend bringing down a storage server before service (I hate risk and complexity). And, I hate paying for hot swamp gear and there isn’t space for hot swap in very high density designs. Personally, I’m fine with a “shut-down to service” model but others will disagree.

The authors compared their hardware storage costs to a wide array of storage sub-systems from EMC through Sun and Netapp. They also compared to Amazon S3 and made what is a fairly unusual mistake for a service provider. They compared on-premise storage equipment purchase cost (just the hardware) with a general storage service. The storage pod costs include only hardware while the S3 costs include data center rack space, power for the array, cooling, administration, inside the data center networking gear, multi-data center redundancy, a general I/O path rather than one only appropriate for cold storage, and all the software to support a highly reliable, geo-redundant storage service. So I’ll quibble on their benchmarking skills – the comparison is of no value as currently written — but, on the hardware front, it’s very nice work.

Good engineering and a very cool contribution to the industry to publish the design. One more powerful tool to challenge the server tax. Well done Backblaze.

VentureBeat article: http://venturebeat.com/2009/09/01/backblaze-sets-its-cheap-storage-designs-free/.


James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
james@amazon.com

H:mvdirona.com | W:mvdirona.com/jrh/work | blog:http://perspectives.mvdirona.com