Long tailed workloads and the return of hierarchical storage management

Hierarchical storage management (HSM) also called tiered storage management is back but in a different form. HSM exploits the access pattern skew across data sets by placing cold, seldom accessed data on slow cheap media and frequently accessed data on fast near media. In old days, HSM typically referred to system mixing robotically managed tape libraries with hard disk drive staging areas. HSM was actually never gone – its just a very old technique to exploit data access pattern skew to reduce storage costs. Here’s an old unit from FermiLab.

Hot data or data currently being accessed is stored on disk and old data that has not been recently accessed is stored on tape. It’s a great way to drive costs down below disk but avoid the people costs of tape library management and to (slightly) reduce the latency of accessing data on tape.

The basic concept shows up anywhere where there is data access locality or skew in the access patterns where some data is rarely accessed and some data is frequently accessed. Since evenly distributed, non-skewed access pattern only show up in benchmarks, this concept works on a wide variety of workloads. Processors have cache hierarchies where the top of the hierarchy is very expensive register files and there are multiple layers of increasingly large caches between the register file and memory. Database management systems have large in memory buffer pools insulating access to slow disk. Many very high scale services like Facebook have mammoth in-memory caches insulating access to slow database systems. In the Facebook example, they have 2TB of Memcached in front of their vast MySQL fleet: Velocity 2010.

Flash memory again opens up the opportunity to apply HSM concepts to storage. Rather than using slow tape and fast disk, we use (relatively) slow disk and fast NAND flash. There are many approaches to implementing HSM over flash memory and hard disk drives. Facebook implemented Flashcache which tracks access patterns at the logical volume layer (below the filesystem) in Linux with hot pages written to flash and cold pages to disk. LSI is a good example implementation done at the disk controller level with their CacheCade product. Others have done it in application specific logic where hot indexing structures are put on flash and cold data pages are written to disk. Yet another approach that showed up around 3 years ago is a Hybrid Disk Drive.

Hybrid drives combine large, persistent flash memory caches with disk drives in a single package. When they were first announced, I was excited by them but I got over the excitement once I started benchmarking. It was what looked to be a good idea but the performance really was unimpressive.

Hybrid rives still looks like a good idea but now we actually have respectable performance with the Seagate Momentus XT. This part is actually aimed at the laptop market but I’m always watching client progress to understand what can be applied to servers. This finally looks like its heading in the right direction. See the AnandTech article on this drive for more performance data: Seagate’s Momentus XT Reviewed, Finally a Good Hybrid HDD. I still slightly prefer the Facebook FlashCache approach but these hybrid drives are worth watching.

Thanks to Greg Linden for sending this one my way.

–jrh

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

One comment on “Long tailed workloads and the return of hierarchical storage management
  1. Chris says:

    Hi,

    I saw this disk as well. It seems it lacks TLER (Time-Limiter Error Recovery) for now to be able to be mounted in RAID arrays… Seagate, please, hear me !

    Chris

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.