Optical Archival Storage Technology

It’s an unusual time in our industry where many of the most interesting server, storage, and networking advancements aren’t advertised, don’t have a sales team, don’t have price lists, and actually are often never even mentioned in public. The largest cloud providers build their own hardware designs and, since the equipment is not for sale, it’s typically not discussed publically.

A notable exception is Facebook. They are big enough that they do some custom gear but they don’t view their hardware investments as differentiating. That may sound a bit strange — why spend on something if it is not differentiating? Their argument is that if some other social networking site adopts the same infrastructure that Facebook uses, it’s very unlikely to change social networking market share in any measureable way. Customers simply don’t care about infrastructure. I generally agree with that point. It’s pretty clear that if MySpace had adopted the same hardware platform as Facebook years ago, it really wouldn’t have changed the outcome. Facebook also correctly argues that OEM hardware just isn’t cost effective at the scale they operate. The core of this argument is custom hardware is needed and social networking customers go where their friends go whether or not the provider does a nice job on the hardware.

I love the fact that part of our industry is able to be open about the hardware they are building but I don’t fully agree that hardware is not differentiating in the social networking world. For example, maintaining a deep social graph is actually a fairly complex problem. In fact, I remember when tracking friend-of-a-friend over 10s of millions of users, a required capability of any social networking site today, was still just a dream. Nobody had found the means to do it without massive costs and/or long latencies. Lower cost hardware and software innovation made it possible and the social network user experience and engagement has improved as a consequence.

Looking at a more modern version of the same argument, It has not been cost effective to store full resolution photos and videos using today’s OEM storage systems. Consequently, most social networks haven’t done this at scale. It’s clear that storing full resolution images would be a better user experience and it’s another example where hardware innovation could be differentiating.

Of course the data storage problem is not restricted to social networks nor just to photo and video.The world is realizing the incredible value of data and the same time the costs of storage are plummeting. Most companies storage assets are growing quickly. Companies are hiring data scientists because even the most mundane bits of operational data can end up being hugely valuable. I’ve always believed in the value of data but more and more companies are starting to realize that data can be game changing to their businesses. The perceived value is going up fast while, at the same time, the industry is realizing that if you have weekly data, it is good. But daily is better, hourly is a lot better, 5 min is awesome but you really prefer 1 min granularity. This number keeps falling. The perceived value of data is climbing the resolution of measures is becoming finer and, as a consequence, the amount of data being stored is skyrocketing. Most estimates have data volumes doubling on 12 to 18 month centers — somewhat faster than Moore’s law. Since all operational data backs up to cold storage, cold storage is always going to be larger than any other data storage category.

Next week, Facebook will show work they have been doing in cold storage mostly driven by their massive image storage problem. At OCP Summit V an innovative low-cost archival storage hardware platform will be shown. Archival projects always catch my interest because the vast majority of the world’s data is cold, the percentage that is cold is growing quickly, and I find the purity of a nearly single dimensional engineering problem to be super interesting. Almost the only dimension of relevance in cold storage is cost. See Glacier: Engineering for Cold Data Storage in the Cloud for more on this market segment and how Amazon Glacier is addressing it in the cloud.

This Facebook hardware project is particularly interesting in that it’s based upon an optical media rather than tape. Tape economics come from a combination of very low cost media combined with only a small number of fairly expensive drives. The tape is moved back and forth between storage slots and the drives when needed by robots. Facebook is taking the same basic approach of using robotic systems to allow a small number of drives to support a large media pool. But, rather than using tape, they are leveraging the high volume Blu-ray disk market with the volume economics driven by consumer media applications. Expect to see over a Petabyte of Blu-ray disks supplied by a Japanese media manufacturer housed in a rack built by a robotic systems supplier.

I’m a huge believer in leveraging consumer component volumes to produce innovative, low-cost server-side solutions. Optical is particularly interesting in this application and I’m looking forwarding to seeing more of the details behind the new storage platform. It looks like very interesting work.

James Hamilton
e: jrh@mvdirona.com
http://blog.mvdirona.com / http://perspectives.mvdirona.com

4 comments on “Optical Archival Storage Technology
  1. Nancylin06 says:

    Tape seems to require more electricity to maintain and operate than optical storage. As data center’s energy consumption is mounting trenmebdously, which one source says to be 10% of total electricity production globally, and global warming is getting more seriously this year, I wonder why decision makers are not considering to use energy efficient solution such as optics to reduce energy cost.

    • You will find there are some absolutely silly data points out there on data center power consumption. 10% of all energy used in the world is a couple of orders of magnitude wrong. But I still agree with your core point that power really does matter and most large scale operators feel the same way are very focused on reducing power consumption because it has a material impact on their overall costs and reducing power consumption is also good for the environment.

      Looking at archival storage media tape is actually pretty similar to optics in power consumption. Both optical drives and tape drives consume power. But, most tape and most optical media are not in the tape drive so not consuming any power. Even some disk based archival systems depower the disks when they are not being searched to get disk archival storage close to tape and optical in power consumption. All three media approaches to archival are consuming no power when not in being written or read.

      Focusing on the tape to optical power differences, they are very small but tape has slightly more constrained environmental constraints so tape not currently in a drive might consume fractionally more power that optical with essentially no environmental constraints. And tape drives are slightly more power intensive than optical media drives so again, you are right, there are slight advantages to optical media but the differences are very small. Generally, in most high-scale general purpose data centers, the percentage of power consumed by archival storage are very small so the media choice will be more strongly influenced by other factors.

  2. Jim said "Blu-Ray jukeboxes with front-end disk cache were a good fit for our application"

    It looks like their are now more interesting solutions comming.

  3. Jim Browne says:

    This is interesting news given that PowerFile went under; though I don’t know if that was due to market forces or mismanagement. We found that their Blu-Ray jukeboxes with front-end disk cache were a good fit for our application (10+ years of video archive) and footprint (a few older data centers with low W/sqft ratios.)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.