Glacier: Engineering for Cold Data Storage in the Cloud

Earlier today Amazon Web Services announced Glacier, a low-cost, cloud-hosted, cold storage solution. Cold storage is a class of storage that is discussed infrequently and yet it is by far the largest storage class of them all. Ironically, the storage we usually talk about and the storage I’ve worked on for most of my life is the high-IOPS rate storage supporting mission critical databases. These systems today are best hosted on NAND flash and I’ve been talking recently about two AWS solutions to address this storage class:

Cold storage is different. It’s the only product I’ve ever worked upon where the customer requirements are single dimensional. With most products, the solution space is complex and, even when some customers may like a competitive product better for some applications, your product still may win in another. Cold storage is pure and unidimensional. There is only really one metric of interest: cost per capacity. It’s an undifferentiated requirement that the data be secure and very highly durable. These are essentially table stakes in that no solution is worth considering if it’s not rock solid on durability and security. But, the only dimension of differentiation is price/GB.

Cold storage is unusual because the focus needs to be singular. How can we deliver the best price per capacity now and continue to reduce it over time? The focus on price over performance, price over latency, price over bandwidth actually made the problem more interesting. With most products and services, it’s usually possible to be the best on at least some dimensions even if not on all. On cold storage, to be successful, the price per capacity target needs to be hit. On Glacier, the entire project was focused on delivering $0.01/GB/Month with high redundancy and security and to be on a technology base where the price can keep coming down over time. Cold storage is elegant in its simplicity and, although the margins will be slim, the volume of cold storage data in the world is stupendous. It’s a very large market segment. All storage in all tiers backs up to the cold storage tier so its provably bigger than all the rest. Audit logs end up in cold storage as do web logs, security logs, seldom accessed compliance data, and all other data I refer jokingly to as Write Only Storage. It turns out that most files in active storage tiers are actually never accessed (Measurement and Analysis of Large Scale Network File System Workloads ). In cold storage, this trend is even more extreme where reading a storage object is the exception. But, the objects absolutely have to be there when needed. Backups aren’t needed often and compliance logs are infrequently accessed but, when they are needed, they need to be there, they absolutely have to be readable, and they must have been stored securely.

But when cold objects are called for, they don’t need to be there instantly. The cold storage tier customer requirement for latency ranges from minutes, to hours, and in some cases even days. Customers are willing to give up access speed to get very low cost. Potentially rapidly required database backups don’t get pushed down to cold storage until they are unlikely to get accessed. But, once pushed, it’s very inexpensive to store them indefinitely. Tape has long been the media of choice for very cold workloads and tape remains an excellent choice at scale. What’s unfortunate, is that the scale point where tape starts to win has been going up over the years. High-scale tape robots are incredibly large and expensive. The good news is that very high-scale storage customers like Large Hadron Collider (LHC) are very well served by tape. But, over the years, the volume economics of tape have been moving up scale and fewer and fewer customers are cost effectively served by tape.

In the 80s, I had a tape storage backup system for my Usenet server and other home computers. At the time, I used tape personally and any small company could afford tape. But this scale point where tape makes economic sense has been moving up. Small companies are really better off using disk since they don’t have the scale to hit the volume economics of tape. The same has happened at mid-sized companies. Tape usage continues to grow but more and more of the market ends up on disk.

What’s wrong with the bulk of the market using disk for cold storage? The problem with disk storage systems is they are optimized for performance and they are expensive to purchase, to administer, and even to power. Disk storage systems don’t currently target cold storage workload with that necessary fanatical focus on cost per capacity. What’s broken is that customers end up not keeping data they need to keep or paying too much to keep it because the conventional solution to cold storage isn’t available at small and even medium scales.

Cold storage is a natural cloud solution in that the cloud can provide the volume economics and allow even small-scale users to have access to low-cost, off-site, multi-datacenter, cold storage at a cost previously only possible at very high scale. Implementing cold storage centrally in the cloud makes excellent economic sense in that all customers can gain from the volume economics of the aggregate usage. Amazon Glacier now offers Cloud storage where each object is stored redundantly in multiple, independent data centers at $0.01/GB/Month. I love the direction and velocity that our industry continues to move.

More on Glacier:

· Detail Page: http://aws.amazon.com/glacier

· Frequently Asked Questions: http://aws.amazon.com/glacier/faqs

· Console access: https://console.aws.amazon.com/glacier

· Developers Guide: http://docs.amazonwebservices.com/amazonglacier/latest/dev/introduction.html

· Getting Started Video: http://www.youtube.com/watch?v=TKz3-PoSL2U&feature=youtu.be

By the way, if Glacier has caught your interest and you are an engineer or engineering leader with an interest in massive scale distributed storage systems, we have big plans for Glacier and are hiring. Send your resume to glacier-jobs@amazon.com.

–jrh

James Hamilton
e:
jrh@mvdirona.com
w:
http://www.mvdirona.com
b:
http://blog.mvdirona.com / http://perspectives.mvdirona.com

12 comments on “Glacier: Engineering for Cold Data Storage in the Cloud
  1. Larry liang says:

    Hi James,
    It’s almost end of 2019 now. Would be really good if you can write abt the Glacier storage technology.

    I’d also like to understnd how durability is calculated.

    Many thanks,
    Larry

    • Glacier is an interesting example in that it’s a solution where different competitors have chosen to compete with quite different technologies: Some use off the shelf tape libraries, some use the largest and slowest disk available potentially with denser formats that come with update restrictions (e.g. SMR), and some have used optical drives. If all three had the same volume economics and the same investment levels, I suspect any of the three could be made quite cost effective. But, that’s never the case and choosing an economic solution that has the right cost trajectory over time is fairly complex and this is an unusual case where we don’t publish the details behind the technology being used for Glacier nor do we commit that it being a single technology for all data access patterns.

      For durability calculations, these are complex spreadsheet models that get updated frequently as software systems are enhanced and hardware components are changed. Mercifully, maintaining these are not one of my core responsibilities but I’m glad the work gets the focus it does.

  2. Shubham Mishra says:

    What is the storage technology behind the low cost and high durability of Glacier, is it deduplication, compression and then erasure coding to store the data in form of objects in disk storage…?

    • Shubham, it’s a great question and I’ve read a lot of super interesting speculation about what the Glacier team is doing. Most of the speculation is just that, speculation, and much of it is in conflict. Some guess disk, some conclude tape, and we’ve not published what it is. Your thinking is far clearer than most I’ve seen in that it focuses on known mechanisms that work in reducing cost at a negative impact in some other dimension. For example,dedupe takes on the complexity of managing a very large index of blocks and looking up existing block signatures before storing a block a second time. This gives up sequential access, forces an index to be maintained, required the computation of a cryptographic hash but we know that it’s a good trade off for less than red hot workloads.

      In many ways, you are on the right track in the way you are looking at the problem. But, I can’t confirm the set of techniques used by Glacier nor the storage media used. It’s just the reality of offering great services that competitors would like to also offer.

      A few other factors that are important and part of all AWS solutions: 1) great scale. All we do is at great scale and we are good at extracting economies of scale; 2) custom hardware. We are happy to do our own hardware designs optimized for the workload and acquired directly from manufacturers with very low margin requirements; 3) custom software. We are happy to invest deeply in custom software if it produces a better, more reliable, or lower cost solution and 4) operational excellence. We really work hard to make our systems able to be operated at low cost at great scale.

      Much of what makes Glacier super interesting is the cost and the team has an excellent roadmap that will allow them to continue to improve the cost. What I find even more interesting is they also have plans to add really interesting features to the service (without impacting cost). I find this interesting because I would naively have described cold storage as a near uni-dimensional problem where cost is the primary and almost only feature. It turns out there is considerable headroom left in cold storage to improve both the cost while at the same time providing a richer set of features and functions on the service.

      • Shubham Mishra says:

        James, thanks for responding… It’s great to hear that AWS Glacier has a strong roadmap, and a lot of hardware and software in it is custom built from commodity resources, which explains its scale, durability and cost effectiveness. Glacier and S3 are one of the best services AWS offers. I was going through some optical disk design solutions, and it seems data written parallelly across many optical disks (for faster writes and faster reads) could be a good fit for AWS Glacier, they are cheap, easily available, scalable, small and have a small footprint. But as you said, this again is a speculation :)

        • Yes, I’ve looked closely at optical and like the work underway. Disk has the advantage of vast volume driving massive R&D investments. Optical and tape are both smaller markets so less invested. Optical is a good example of a technology with good potential where further investment could move quickly to much denser media. But the R&D budget isn’t there so the pace is slower than ideal. Tape is a bigger market but it isn’t growing fast and some recent years has it has actually contracted so it too doesn’t get disk-like R&D spending. Disk markets have challenges where semiconductor solutions replace them in some markets but the HDD market remains vast. But the HDD market is competitive and the industry has focused very hard for many years on leveraging a common platform across the entire product line to reduce costs and, consequently, it’s hard for the disk storage manufacturers to be comfortable doing a platform focused on cold stroage. Consequently, the hdd cold storage solution isn’t ideal and isn’t evolving quickly either.

          The short answer is all three storage technologies continue to have great promise for cold sorage but either market size issues or, in other cases, leadership conservatism leads to less than ideal cold storage investment levels. The HDD market has the largest revenue streams but they are also the most conservative business leaders. The optical technologies have the greatest advances just waiting for a bit more R&D to pick up the pace but optical is by far the smallest market segment today so it’s hard for them to support the needed long term R&D streams.

  3. Jan, that’s an awesome technique to fully utilize large disks with mixed workloads. In fact, its one of the best tricks I know. But, its a challenge to execute upon in that it means that neither service owns the hardware, neither tracks the disk failures, and when one has an issue and crashes, they can’t reboot the server (or its better if they don’t) and, either service can consume resources that will negative impact the other without carefully placed limits enforced by all cooperating services.

    The best implementation of what you are describing is a low level block storage service that allocates resources to higher level service according to request type and resource requirements. Its a very nice technique and very effective but it requires multiple services to cooperate or depend upon a low-level resource allocation service. So, more work to implement by very high upside if you do implement it. Good point and well done Jan.

    –jrh

  4. Jan Olbrecht says:

    I’d guess the following: You’re short-stroking your disks. The small, high IOPS partition is used for whatever service you have that needs high I/O, the bigger, otherwise "unused" partition will be used for glacier.

    It’s the only thing that makes sense to me and honestly, it’s a nice trick to monetize the otherwise lost diskspace.

    Regards,
    -Jan

  5. Thanks Robert and, yes, I know your right. Many of us would love to get into more detail on the underlying storage technology.

    –jrh

  6. Robert Myhill says:

    Can you elaborate on the storage technology that Glacier is using? I’d love to hear about it.

    Thanks!
    Robert

  7. I didn’t get into the storage technology behind Glacier in this blog post but I probably should in the future.

    –jrh

  8. Joseph Scott says:

    Are you saying then that Amazon Glacier is using tape for backend storage?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.