Resource Consumption Shaping is an idea that Dave Treadwell and I came up with last year. The core observation is that service resource consumption is cyclical. We typically pay for near peak consumption and yet frequently are consuming far below this peak. For example, network egress is typically charged at the 95th percentile of peak consumption over a month and yet the real consumption is highly sinusoidal and frequently far below this charged for rate. Substantial savings can be realized by smoothing the resource consumption.
Looking at the network egress traffic report below, we can see this prototypical resource consumption pattern:
You can see from the chart above that resource consumption over the course of a day varies by more than a factor of two. This variance is driven by a variety of factors, but an interesting one is the size of the Pacific Ocean, where the population density is near zero. As the service peak load time-of-day sweeps around the world, network load falls to base load levels as the peak time range crosses the Pacific ocean. Another contributing factor is wide variance in the success of this example service in different geographic markets.
We see the same opportunities with power. Power is usually charged at the 95th percentile over the course of the month. It turns out that some negotiated rates are more complex than this but the same principle can be applied to any peak-load-sensitive billing system. For simplicity sake, we’ll look at the common case of systems that charge the 95th percentile over the month.
Server power consumption varies greatly depending upon load. Data from an example server SKU shows idle power consumption of 158W and full-load consumption of about 230W. If we defer batch and non-user synchronous workload as we approach the current data center power peak we can reduce overall peaks. As the server power consumption moves away from a peak, we can reschedule this non-critical workload. Using this technique we throttle back the power consumption and knock off the peaks by filling the valleys. Another often discussed technique is to shut off non-needed servers and use workload peak clipping and trough filling to allow the workload to be run with less servers turned on. Using this technique it may actually be possible run the service with less servers overall. In Should we Shut Off Servers, I argue that shutting off servers should NOT be the first choice.
Applying this technique to power has a huge potential upside because power provisioning and cooling dominates the cost of a data center. Filling valleys allows better data center utilization in addition to lowering power consumption charges.
The resource-shaping techniques we’re discussing here, that of smoothing spikes by knocking off peaks and filling valleys, applies to all data center resources. We have to buy servers to meet the highest load requirements. If we knock off peaks and fill valleys, less servers are needed. This also applies also to internal networking. In fact, Resource Shaping as a technique applies to all resources across the data center. The only difference is the varying complexity of scheduling the consumption of these different resources.
One more observation along this theme, this time returning to egress charges. We mentioned earlier that egress was charged at the 95th percentile. What we didn’t mention is that ingress/egress are usually purchased symmetrically. If you need to buy N units of egress, then you just bought N units of ingress whether you need it or not. Many services are egress dominated. If we can find a way to trade ingress to reduce egress, we save. In effect, it’s cross-dimensional resource shaping, where we are trading off consumption of a cheap or free resources to save an expensive one. On an egress dominated service, even ineffective techniques that trade off say 10 units of ingress to save only 1 unit of egress may still work economically. Remote Differential Compression is one approach to reducing egress at the expense of a small amount of ingress.
The cross-dimensional resource-shaping technique described above where we traded off ingress to reduce egress can be applied across other dimensions as well. For example, adding memory to a system can reduce disk and/or network I/O. When does it make sense to use more memory resources to save disk and/or networking resources? This one is harder to dynamically tune in that it’s a static configuration option but the same principles can be applied.
We find another multi-resource trade-off possibility with disk drives. When a disk is purchased, we are buying both a fixed I/O capability and a fixed disk capacity in a single package. For example, when we buy a commodity 750GB disk, we get a bit less than 750GB of capacity and the capability of somewhat more than 70 random I/Os per second (IOPS). If the workload needs more than 70 I/Os per second, capacity is wasted. If the workload consumes the disk capacity but not the full IOPS capability, then the capacity will be used up but the I/O capability will be wasted.
Even more interesting, we can mix workloads from different services to “absorb” the available resources. Some workloads are I/O bound while others are storage bound. If we mix these two storage workloads types, we may be able to fully utilize the underlying resource. In the mathematical limit, we could run a mixed set of workloads with ½ the disk requirements of a workload partitioned configuration. Clearly most workloads aren’t close to this extreme limit but savings of 20 to 30% appear attainable. An even more powerful saving is available from mixing workloads using storage by sharing excess capacity. If we pool the excess capacity and dynamically move it around, we can safely increase the utilization levels on the assumption that not all workloads will peak at the same time. As it happens, the workloads are not highly correlated in their resource consumption so this technique appears to offer even larger savings than what we would get through mixing I/O and capacity-bound workloads. Both gains are interesting and both are worth pursuing.
Note that the techniques that I’ve broadly called resource shaping are an extension to an existing principle called network-traffic shaping http://en.wikipedia.org/wiki/Traffic_shaping. I see great potential in fundamentally changing the cost of services by making services aware of the real second-to-second value of a resource and allowing them to break their resource consumption into classes of urgent (expensive), less urgent (somewhat cheaper), and bulk (near free).
–jrh
James Hamilton, jrh@mvdirona.com
Thanks John.
Eric, you asked for examples of resource consumption shaping in use. Your example of repurposing Yahoo! Login each night to run a different personality is perfect. As you know, it’s not even need virtualization to be able do to it as long as your provisioning system is able to install full images (I argue in Designing & Deploying Internet-Scale Services that you want to invest in a management and provisioning system that can do full system imaging). The only difference between using VMs and re-imaging when repurposing servers is the VM system is faster in making the change.
Simpler version of the same techniques exist and are in broad use. Here’s one that I’ve seen implemented twice that has a fraction of the difficulty of fully re-purposing and still has good value. Database servers often have periodic jobs that run every day, every hour, or some other frequency. These jobs harvest deleted items, re-index, compute stats, and perform other administrative tasks. These tasks need to be done but they don’t need to be done NOW. The resource consumption technique in this case is to defer batch workload during peak loads. Simple to execute and remarkably effective.
Resource Consumption Shaping can be applied at all levels from full machine repurposing all the way down to less-urgent workload deferral.
–jrh
Great post. That graph also looks like an excellent candidate for:
http://code.flickr.com/blog/2008/12/17/web-ops-visualizations-group-on-flickr/
:)
–john
Another interesting article, but I’d really like some examples of where and particularly how you’ve done this effectively in the past. In a prior life, I ran the login service at Yahoo where we had a large bank of machines in several clusters, each cluster showing a similar CPU, Power, Network graph, etc. However, as the service was login, we didn’t have much batch and non-user synchronous services to run in the valleys.
We explored letting other groups use the valleys, but the organizational challenges made that difficult. The notion of ownership and responsibility for a bank of machines was very strong and sharing the downtime of machines used as a critical resource leads to distrust. What if the other team leaves the disk full, or doesn’t yield at the appropriate time, etc. Who’s gets paged for an issue with the machine?
I suspect that a variation of "turn the machine off", which was instead to reprovision the machine from one purpose to another at set times could be workable. At 7pm turn off 10% of the login machines and bring them back up as log processors and then reverse it again at 7am, but our virtualization and provisioning systems weren’t good enough at the time to support that. Other setups where all machines appear virtual and the engineering team is very decoupled from the hardware their software runs on could also make that easier.