When SSDs Make Sense in Client Applications

In When SSDs Make Sense in Server Applications, we looked at where Solid State Drives (SSDs) were practical in servers and services. On the client side, there are even more reasons to use SSDs and I expect that within three years, more than half of enterprise laptops will have NAND Flash as at least part of their storage subsystems. This estimate has SSDs in 38% of all laptops by 2011: Flash SSD in 38% of Laptops by 2011.

What follows is a quick summary of SSD advantages on the client side, followed by the disadvantages, and then a closer look at the write endurance (wear-out) problem that has been the topic of much discussion recently.

Client SSD Advantages:

· Random IOPS: Laptop I/O patterns are dominated by random workloads and, as argued in When SSDs Make Sense in Server Applications, these workloads run cost effectively on SSDs

· Low Power: SSD power consumption is typically in the under 2W range and often under 1W. Enterprise disk can run 15 to 18W, desktop parts are typically in the 10W range but laptop drives usually run a more modest 2.5W when active. So, on one hand this is represents an exciting reduction in storage power of a factor of 2 but, on the other, it’s actually only a 1W saving when the HDD is active and even less when idle. A savings but a small one overall. If you are interested in more data on laptop power consumption see Client-Side Power Consumption. Some very efficient HDDs actually have less idle power consumption than some SSDs so it’s not even the case that SSDs are all better under all conditions from a power consumption perspective.

· Quiet. HDDs can be noisy. They are mechanical parts with precision bearings spinning at high speeds and they make noise. Semi-conductor-based SSDs avoid this.

· Small Form Factors: SSDs can be small and light weight.

· Scale Down Floor: Disks have a price floor where further lowering the capacity of the device doesn’t save money. This price floor changes over time but, at this point, it’s hard to get much below $30 for a disk regardless of how small. The fixed costs of the mechanical parts dominate the media and the cost of the disk doesn’t scale down. SSD costs scale down well and for applications with modest storage requirements, they can be less expensive. This makes them interesting for very low-end laptops, netPCs, ultra-mobile PCs, and, of course, NAND Flash is the storage of choice in cell phones, music players, cameras, and other related applications.

· Shock and Vibration: HDDs usually spec max shock in the 50g to as high as 100G range and vibration in the ¼G to ½G. SSD specs run well over 1,000G shock and around 20G vibration. The are much more durable to this common threat in the laptop world.

· Latency: I/O latency is far lower on an SSD than a HDD and this is particularly noticeable when I/O queues get deep as they often do on single disk laptops.

· Reliability: HDDs are the number one failing component on clients (and servers). This is particularly a problem on laptops as they are (usually) single drive devices and often not well backed up. HDD failures represent a substantial service cost in most enterprises so eliminating them is appealing. Our operational history with SSDs is fairly short so far but we expect they will exhibit less frequent failures that hard disks. However, like all new components, they bring additional failure modes as well as eliminating a few. The biggest concern around SSDs is write endurance with SLC part lifetimes typically in the range of 10^5 writes and MLC parts down around 10^4 write cycles (some even lower). We’ll look at that in more depth below.

· Temperature: SSDs have a much wider temperatures and humidity operating range than HDDs.

Client SSD Disadvantages:

· Capacity/$: Flash devices can deliver excellent random I/O performance and laptops, with only a single disks are frequently random I/O bound rather than capacity limited. In fact, many enterprises customers actually want LESS storage on their laptop fleet. For them, having less capacity is often either not a problem or even a potential advantage. For my uses and for many consumer usage patterns, capacity remains important with pictures, audio, and other media files driving space requirements up to the point where SSDs can be tough to afford. As a direct consequence, I expect that we’ll see more enterprise than consumer use of SSDs in clients.

· Performance Degradation: There have been many reports of SSDs initially performing well and then degrading over time. See Laptop SSD Performance Degradation Problems for more detail.

· Endurance: This is the most common concern I’ve heard of late with MLC write endurance only around 10,000 writes.

Write Endurance

I keep hearing anecdotal reports that SSDs in laptops are going to fail in the first year due to the poor write endurance of MLC SSDs. The typical MLC write endurance is usually quoted at around 10,000 cycles which I agree does sound quite low.

Let’s do a quick back of the envelope on MLC SSD write endurance (SLC parts are typically more expensive but have longer write endurance specifications). Assume a client system is used four hours a day and that it spends ¼ of that time at the max I/O rate of 100 IOPS. My gut feel says this number very likely errs high. Let’s include write amplification. Write amplification is a side effect of Flash memory designs having larger blocks as the unit of erase and smaller pages as the unit of read and programming (write). This combined with wear leveling leads to the device having to do some overhead housekeeping writes when servicing writes from the host system. Assume an average write amplification of 3x over three years of life which again seems high. To make it really aggressive we’ll assume a write to read ration of 1:1 (50% writes) which is very high. Finally let assume it’s a 64GB MLC device and that my writes are all to 4k pages and the overheads are all accounted for by my 3x write amplification number.

4*60*60*365*3*.25*.5*3*100 => 591m

Reading left to write, that’s 4 hours a day * 60 to get minutes * 60 to get seconds * 365 to get seconds use per year * 3 to get seconds use in three years, *25% of time at max I/O, *50% of I/Os are writes, *3 write amplification, *100 I/Os per second.

In aggregate, that’s about ½ billion write I/Os is needed by each laptop living three years. But, a 64GB device has 16m pages. If you spread ½ billion writes over 16m pages with perfect wear leveling, you would have 36 write I/Os per page. Very low. With terrible wear leveling, it could go up an order of magnitude but it’s still a very low number. Move write amplification up to 5x and the wear/page still looks tiny. Move the usage pattern up from 4 hours a day to an aggressive 16 hours a day and it’s still only 147 writes per page. Perhaps we’ll use more lifetime I/Os with an SSD than my magnetic disk model above assuming we spend less time waiting but, still, it’s not looking that big a number of lifetime writes required.

If we use very low endurance MLC where write endurance is specified down around 1,000 cycles rather than the more common 10,000, it’s still not a problem. But is within on order of magnitude so arguably a concern over a three year life. And it would be definitively a concern over a 5 year life.

Because client systems spend such a small percentage of their working lifetimes at 100% I/O rates, it’s hard to see a credible usage model that has MLC write endurance as a serious problem if using parts specified at 10,000 write cycles.

In a subsequent post, we’ll look back at server applications and, in contrast with When SSDs Make Sense in Server Applications, we’ll look at where SSDs don’t make sense on the server side.

James Hamilton, Data Center Futures
Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh | blog:http://perspectives.mvdirona.com

4 comments on “When SSDs Make Sense in Client Applications
  1. Brad, you are right that any filesystem will have hot spots and all data has skew — some data is cold and some is red hot. SSDs do logical to physical mapping. The logical pages are a linear array of page that the O/S thinks is the physical pages. Each of these pages is mapped to some location on the SSD. SSDs never overwrite the same location. If you update the page, it’ll get written somewhere else. The SSD does wear leveling (moving pages around to balance wear) and physical to logical page mapping. Often these two operations are collectively referred to as wear leveling.

    So yes there are hot logical pages but the the SSD does wear leveling to avoid the problem you bring up.

    –jrh
    jrh@mvdirona.com

  2. Brad Dodson says:

    What about areas of some file systems that are very frequently updated? Couldn’t these become a single point of failure? Or would the disk/OS recognize the write failures and move the location to a different block.
    Basically, I agree with your analysis of the average number of writes to blocks, but you disregard the "birthday problem" where it’s pretty likely that at least on block will get a ton of writes.
    Just a thought
    -brad

  3. There may be parts that move entire blocks (erase units) as a result of a random write to a page (write unit) within that block as you describe. I agree, that behavior would lead to truly stratospheric write amplification numbers. Current generation SSDs (e.g. http://www.intel.com/design/flash/nand/index.htm) are fortunately not this bad.

    –jrh
    jrh@mvdirona.com

  4. Aviad says:

    In my humble opinion, I think you might be underestimating the write emplification factor. Most devices use FTLs rather than flash specific file system. And many (if not most) FTLs relocate (and actually re-write) an entire erase unit as a result of a random page write. This results of course in tremendous overhead, much more than the 3 or 5 write amplification factor you used.

    This of course assuming a predominant random workload

Leave a Reply

Your email address will not be published. Required fields are marked *