SSD versus Enterprise SATA and SAS disks

In Where SSDs Don’t Make Sense in Server Applications, we looked at the results of a HDD to SSD comparison test done by the Microsoft Cambridge Research team. Vijay Rao of AMD recently sent me a pointer to an excellent comparison test done by AnandTech. In SSD versus Enterprise SAS and SATA disks, Anandtech compares one of my favorite SSDs, the Intel X25-E SLC 64GB, with a couple of good HDDs. The Intel SSD can deliver 7000 random IOPS/sec and the 64GB component is priced in the $800 range.

The full AnandTech comparison is worth reading but I found the pricing with sequential and random I/O performance data is particularly interesting. I’ve brought this data together into the table below:

Drive

Capacity

Pricing

$/GB

$/Seq Read ($/MB/s)

$/Seq Write $/MB/s)

Seq I/O Density

$/Rdm Read ($/MB/s)

$/Rdm Write ($/MB/s)

Rdm I/O Density

Intel X25-E SLC

64GB

$795-$900

$13.24

$3.28

$4.28

3.563

$17.66

$9.02

1.109

Cheetah 15k

300GB

$270-$300

$0.95

$2.28

$2.24

0.420

$142.50

$57.00

0.012

WD 1000FYPS

1TB

$190-$200

$0.20

$2.71

$2.50

0.075

$195.00

$65.00

0.002

Notes:

All I/O measurements obtained using SQLIO

Random I/O measurements using 8k pages

Sequential measurements using 64kB I/Os

I/O density is average of read and write performance divided by capacity

Price calculations based upon average of selling price range listed.

Source: Anandtech (http://it.anandtech.com/IT/showdoc.aspx?i=3532&p=1)

Looking at this data in detail, we see the Intel SSD produces extremely good Random I/O rates but we should all know that raw performance is the wrong measure. We should be looking at dollars per unit performance. By this more useful metric, the Intel SSD continues to look very good at $17.66 $/MB/s on 8K read I/Os whereas the HDDs are $142 and $195 $/MB/s respectively. For hot random workloads, SSDs are a clear win.

What do I mean by “hot random workloads”? By hot, I mean a high number of random IOPS per GB. But, for a given storage technology, what constitutes hot? I like to look at I/O density which is the cutoff between a given disk with a given workload being capacity bound or I/O rate bound. For example, looking at the table above we see the random I/O density for an 64GB Intel disk is 1.109 MB/s/GB. If you are storing data where you need 1.109 MB/s of 8k I/Os per GB of capacity or better, then the Intel device will be I/O bound and you won’t be able to use all the capacity. If the workload requires less than this number, then it is capacity bound and you won’t be able to use all the IOPS on the device. For very low access rate data, HDDs are a win. For very high access rate data, SSDs will be a better price performer.

As it turns out, when looking at random I/O workloads, SSDs are almost always capacity bound and HDDs are almost always IOPS bound. Understanding that we can use a simple computation to compare HDD cost vs SSD cost on your workload. Take the HDD farm cost which will be driven by the number of disks needed to support the I/O rate times the cost of the disk. This is the storage budget needed to support your workload on HDDs. Take the size of the database and divide by the SSD capacity to get the number of SSDs required. Multiple the number of SSDs required times the price of the SSD. This is the budget required to support your workload on SSDs. If the SSD budget is less (and it will be for hot, random workloads), then SSDs are a better choice. Otherwise, keep using HDDs for that workload.

In the sequential I/O world, we can use the same technique. Again, we look at the sequential I/O density to understand the cut off between bandwidth bound and capacity bound for a given workload. Very hot workloads over small data sizes will be a win on SSD but as soon as the data sizes get interesting, HDDs are a more economic solution for sequential workloads. The detailed calculation is the same. Figure out how many HDDs required to support your workload on the basis of capacity or sequential I/O rates (depending upon which is in shortest supply for your workload on that storage technology). Figure out the HDDs budget. Then do the same for SSDs and compare the numbers. What you’ll find is that, for sequential workloads, SSDs are only best value for very high I/O rates over relatively small data sizes.

Using these techniques and data we can see when SSDs are a win for workloads with a given access pattern. I’ve tested this line of thinking against many workloads and find that hot, random workloads can make sense on SSDs. Pure sequential workloads almost never do unless the access patterns are very hot or the capacity required relatively small.

For specific workloads that are neither pure random nor pure sequential, we can figure out the storage budget to support the workload on HDDs and on SSDs as described above and do the comparison. Using these techniques, we can step beyond the hype and let economics drive the decision.

James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
james@amazon.com

H:mvdirona.com | W:mvdirona.com/jrh/work | blog:http://perspectives.mvdirona.com

5 comments on “SSD versus Enterprise SATA and SAS disks
  1. Rob Bergin says:

    You should try and get a Fusion IO card – its a NAND-based PCI card vs. a SSD drive.

    Be interested to see how it fits in with the other products.

  2. Ja says:

    Mehul, I agree that some SSDs have been showing serious degradation during use — I posted on the client-side version of this phenomena: //perspectives.mvdirona.com/2008/04/25/LaptopSSDPerformanceDegradationProblems.aspx. The same issues can be seen on some server-side components.

    However, this phenomena is component-specific and the Intel units have actually been pretty well behaved in longer term testing. I’ll ping Intel and see if I can get some solid data for you on this. I’ll post it if I do.

    –jrh

    James Hamilton, jrh@mvdirona.com

  3. Mehul Shah says:

    Please see my post on your April 12 blog writeup. I believe its relevant here — I just read this one. I generally agree, but please don’t rule out hybrid configurations.

    I’d like to point out that your numbers for the Intel SSD are a little optimistic because of the "wear-in" effect that happens for SSDs. Once enough data is written to fill the capacity of the drive, the random write performance deteriorates. I believe this is because the drive’s firmware is performing compaction online to create new free blocks. The relevant numbers for the Intel drive are "probably" about half what you quote: http://csl.cse.psu.edu/wish2009_papers/Polte.pdf.

    It doesn’t change the comparison qualitatively, however.

    – Mehul

  4. I 100% agree Wes. I’ve seen experiments that did this at the device level, block level (logical volume manager), operating system level, and at the application level. All have good promise.

    James Hamilton
    jrh@mvdirona.com

  5. Wes Felter says:

    In the future we will be able to use caching to split a workload into a high-density part and a low-density part, mapping these onto the appropriate hardware.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.