100,000 IOPS

FusionIO has released specs and pricing data on their new NAND flash SSD: http://www.fusionio.com/products.html (Lintao Zhang of msft Research sent it my way). 100,000 IOPS, 700 MB/s sustained read, and 600 MB/s sustained write. Impressive numbers but let’s dig deeper. In what follows, I compare the specs of the FusionIO part with a “typical” SATA disk. For comparison with the 80GB FusionIO part, I’m using the following SATA disk specs: $200, 750GB, 70 MB/s sustained transfer rate, and 70 random IO operations per second (IOPS). Specs change daily but it’s a good approximation of what we get from commodity SATA disk these days.

Obviously, the sustained read and write rates that FusionIO advertises are substantial. But to be truly interesting, they have to produce higher sustained I/O rates/dollar than magnetic disks, the high-volume commodity competitor. A SATA disk runs around $200 and produces roughly 70 MB/s sustained I/O. Looking at read and normalizing for price by comparing MB/s per dollar, we see that the FusionIO part can do roughly 0.29 MB/s/$ whereas the SATA disk will produced 0.35 MB/s/$. The disk produces slightly better sequential transfer rates per dollar. This isn’t surprising, in that we know that disks are actually respectable at sequential access—this workload pattern is not where flash really excels. For sequential workloads or workloads that can be made sequential, at the current price point, I wouldn’t recommend flash in general or the FusionIO SSD in particular. Where they really look interesting is in workloads with highly random I/Os.

Looking at capacity, there are no surprises. Normalizing to dollars/GB, we see the FusionIO part at $30/GB and the SATA disk at $0.26/GB. Where capacity is the deciding factor, magnetic media is considerably cheaper and will be for many years. Capacity per dollar is not where flash SSDs look best.

Where flash SSDs really excels, and where the FusionIO part is particularly good, is in random I/Os per second. They advertise over 100,000 random 4k IOPS (87,500 8k IOPS) whereas our SATA disk can deliver about 70. Again normalizing for costs and looking at IOPS per dollar, we see the FusionIO SSD at 41 IOPS/$ whereas the SATA disk is only 0.27 IOPS/$. Flash SSDs win, and win big, on random I/O workloads like OLTP systems (usually random-I/O-operation bound). These workloads typically run the smallest and fastest disks they can buy, and yet still can’t use the entire disk since the workload I/O rates are so high. To support these extremely hot workloads using magnetic disk, you must spread the data over a large number of disks to effectively dilute the workload I/O rate to that which disks can support.

For workloads where the random I/O rates are prodigious and the overall database sizes fairly small, flash SSDs are an excellent choice. How do we define “fairly small”? I look at it this way: it’s a question of I/O density and I define I/O density to be random IOPS per GB. The SATA disk we are using as an example can support 0.09 IOPS/GB (70/750). If the workload requires less than 0.09 IOPS/GB, then it will be disk-capacity bound whereas if it needs more than 0.09 IOPS/GB, then it’s I/O bound. Assuming the workload is IO bound, how to decide whether SSDs are the right choice? Start by figuring out how many disks would be required to support the workload and what they would cost: take the sustained random IOPS required by the application and divide by the number of IOPS each disk can sustain (70 in the case of our example SATA drive or 180 to 200 if using enterprise disk). That defines the cost of supporting this application using magnetic disk. Now figure the same number for flash SSD. Aggregate workload required I/O rate divided by the sustained random IOPS the SSD under consideration can deliver. This will determine how many disks are needed to support the I/O rate. Given that flash SSDs can deliver very high I/O densities, you also need to ensure you have enough SSDs to store the entire database. Take the maximum of the number of SSDs required to store the database (if capacity bound) and the number of SSDs required to support the I/O rate (if IOPS bound), and that’s the number of SSDS needed. Compare the cost of the SSDs with the cost of the disk required to support the same workload, and see if SSD is cheaper. For VERY hot workloads, flash SSDs will be cheaper than hard disk drives.

I should point out there are many other factors potentially worth considering when deciding whether a flash SSD is the right choice for your workload, including the power consumption of the disk farm, the failure rate and cost of service, and the wear-out rate and exactly how it was computed for the SSDs. The random I/O rate is the biggest differentiator and the most important for many workloads, so I haven’t considered these other factors here.

Looking more closely at the FusionIO specs, we see they give specs on random IOPS but they don’t specify the read-to-write ratios they can support. We really need to see the number of random read and write IOPS that can be sustained. This is particularly important for flash SSDs since, with these devices, getting extremely high random write I/O rates is VERY difficult and this is typically where the current generation devices fall down. In the case of the FusionIO SSD, we still don’t have the data we need to make a decision and would still need to get third-party benchmark data before making a buying decision.

Another option to consider when looking at very hot workloads is to move the workload into memory when the I/O densities are extreamly high. Memory prices continue to fall and several memory appliance start-ups have recently emerged. I suspect hybrid devices that combine very large DRAM caches with 10 to 100x larger flash stores will likely emerge as great choices over the next year or so. Of the memory appliance vendors, I find Violin Memory to be the most interesting of those I’ve looked at (http://www.violin-memory.com/).

I do love the early performance numbers that FusionIO is now reporting—these are exciting results. But remember, when looking at flash SSDs, including the FusionIO part, you need to get the random write IOPS rate before making a decision. It’s the hardest spec to get right in a flash SSD and I’ve seen results as divergent as random write IOPS at 1/100th the (typically very impressive) read rate. Ask for random write IOPS.

–jrh

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.