Last week, Sudipta Sengupta of Microsoft Research dropped by the Amazon Lake Union campus to give a talk on the flash memory work that he and the team at Microsoft Research have been doing over the past year. Its super interesting work. You may recall Sudipta as one of the co-authors on the VL2 Paper (VL2: A Scalable and Flexible Data Center Network) I mentioned last October.

Sudipta’s slides for the flash memory talk are posted at Speeding Up Cloud/Server Applications With Flash Memory and my rough notes follow:

· Technology has been used in client devices for more than a decade

· Server side usage more recent and the difference between hard disk drive and flash characterizes brings some challenges that need to be managed in the on-device Flash Translation Layer (FTL) or in the operating systems or Application layers.

· Server requirements are more aggressive across several dimensions including required random I/O rates and higher reliability and durability (data life) requirements.

· Key flash characteristics:

· 10x more expensive than HDD

· 10x cheaper than RAM

· Multi Level Cell (MLC): ~$1/GB

· Single Level Cell (SLC): ~$3/GB

· Laid out as an linear array of flash blocks where a block is often 128k and a page is 2k

· Unfortunately the unit of erasure is a full block but the unit of read or write is 2k and this makes the write in place technique used in disk drives not workable.

· Block erase is a fairly slow operation at 1500 usec whereas read or write is 10 to 100 usec.

· Wear is an issue with SLC supporting O(100k) erases and MLC O(10k)

· The FTL is responsible for managing the mapping between logical pages and physical pages such that logical pages can be overwritten and hot page wear is spread relatively evenly over the device.

· Roughly 1/3 the power consumption of a commodity disk and 1/6 the power of an enterprise disk

· 100x the ruggedness over disk drives when active

· Research Project: FlashStore

· Use flash memory as a cache between RAM and HDD

· Essentially a flash aware store where they implement a log structured block store (this is essentially what the FTLs do in the device implemented at the application layer.

· Changed pages are written through to flash sequentially and an in-memory index of pages is maintained so that pages can be found quickly on the flash device.

· On failure the index structure can be recovered by reading the flash device

· Recently unused pages are destaged asynchronously to disk

· A key contribution of this work is a very compact form for the index into the flash cache

· Performance results excellent and you can find them in the slides and the papers referenced below

· Research Project: ChunkStash

· A very high performance, high throughput key-value store

· Tested on two production workloads:

· Xbox Live Primetime online gaming

· Storage deduplication

· The storage dedeuplication test is a good one in that dedupe is most effective with a large universe of objects to run deduplication over. But a large universe requires a large index. The most interesting challenge of deduplication is to keep the index size small through aggressive compaction

· The slides include a summary of dedupe works and shows the performance and compression ratios they have achieved with ChunkStash

For those interested in digging deeper, the VLDB and USENIX papers are the right next stops:

· http://research.microsoft.com/apps/pubs/default.aspx?id=141508 (FlashStore paper, VLDB 2010)

· http://research.microsoft.com/apps/pubs/default.aspx?id=131571 (ChunkStash paper, USENIX ATC 2010)

· Slides: http://mvdirona.com/jrh/talksandpapers/flash-amazon-sudipta-sengupta.pdf

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com