Resource Consumption Shaping is an idea that Dave Treadwell and I came up with last year. The core observation is that service resource consumption is cyclical. We typically pay for near peak consumption and yet frequently are consuming far below this peak. For example, network egress is typically charged at the 95th percentile of peak consumption over a month and yet the real consumption is highly sinusoidal and frequently far below this charged for rate. Substantial savings can be realized by smoothing the resource consumption.
Looking at the network egress traffic report below, we can see this prototypical resource consumption pattern:
You can see from the chart above that resource consumption over the course of a day varies by more than a factor of two. This variance is driven by a variety of factors, but an interesting one is the size of the Pacific Ocean, where the population density is near zero. As the service peak load time-of-day sweeps around the world, network load falls to base load levels as the peak time range crosses the Pacific ocean. Another contributing factor is wide variance in the success of this example service in different geographic markets.
We see the same opportunities with power. Power is usually charged at the 95th percentile over the course of the month. It turns out that some negotiated rates are more complex than this but the same principle can be applied to any peak-load-sensitive billing system. For simplicity sake, we’ll look at the common case of systems that charge the 95th percentile over the month.
Server power consumption varies greatly depending upon load. Data from an example server SKU shows idle power consumption of 158W and full-load consumption of about 230W. If we defer batch and non-user synchronous workload as we approach the current data center power peak we can reduce overall peaks. As the server power consumption moves away from a peak, we can reschedule this non-critical workload. Using this technique we throttle back the power consumption and knock off the peaks by filling the valleys. Another often discussed technique is to shut off non-needed servers and use workload peak clipping and trough filling to allow the workload to be run with less servers turned on. Using this technique it may actually be possible run the service with less servers overall. In Should we Shut Off Servers, I argue that shutting off servers should NOT be the first choice.
Applying this technique to power has a huge potential upside because power provisioning and cooling dominates the cost of a data center. Filling valleys allows better data center utilization in addition to lowering power consumption charges.
The resource-shaping techniques we’re discussing here, that of smoothing spikes by knocking off peaks and filling valleys, applies to all data center resources. We have to buy servers to meet the highest load requirements. If we knock off peaks and fill valleys, less servers are needed. This also applies also to internal networking. In fact, Resource Shaping as a technique applies to all resources across the data center. The only difference is the varying complexity of scheduling the consumption of these different resources.
One more observation along this theme, this time returning to egress charges. We mentioned earlier that egress was charged at the 95th percentile. What we didn’t mention is that ingress/egress are usually purchased symmetrically. If you need to buy N units of egress, then you just bought N units of ingress whether you need it or not. Many services are egress dominated. If we can find a way to trade ingress to reduce egress, we save. In effect, it’s cross-dimensional resource shaping, where we are trading off consumption of a cheap or free resources to save an expensive one. On an egress dominated service, even ineffective techniques that trade off say 10 units of ingress to save only 1 unit of egress may still work economically. Remote Differential Compression is one approach to reducing egress at the expense of a small amount of ingress.
The cross-dimensional resource-shaping technique described above where we traded off ingress to reduce egress can be applied across other dimensions as well. For example, adding memory to a system can reduce disk and/or network I/O. When does it make sense to use more memory resources to save disk and/or networking resources? This one is harder to dynamically tune in that it’s a static configuration option but the same principles can be applied.
We find another multi-resource trade-off possibility with disk drives. When a disk is purchased, we are buying both a fixed I/O capability and a fixed disk capacity in a single package. For example, when we buy a commodity 750GB disk, we get a bit less than 750GB of capacity and the capability of somewhat more than 70 random I/Os per second (IOPS). If the workload needs more than 70 I/Os per second, capacity is wasted. If the workload consumes the disk capacity but not the full IOPS capability, then the capacity will be used up but the I/O capability will be wasted.
Even more interesting, we can mix workloads from different services to “absorb” the available resources. Some workloads are I/O bound while others are storage bound. If we mix these two storage workloads types, we may be able to fully utilize the underlying resource. In the mathematical limit, we could run a mixed set of workloads with ½ the disk requirements of a workload partitioned configuration. Clearly most workloads aren’t close to this extreme limit but savings of 20 to 30% appear attainable. An even more powerful saving is available from mixing workloads using storage by sharing excess capacity. If we pool the excess capacity and dynamically move it around, we can safely increase the utilization levels on the assumption that not all workloads will peak at the same time. As it happens, the workloads are not highly correlated in their resource consumption so this technique appears to offer even larger savings than what we would get through mixing I/O and capacity-bound workloads. Both gains are interesting and both are worth pursuing.
Note that the techniques that I’ve broadly called resource shaping are an extension to an existing principle called network-traffic shaping http://en.wikipedia.org/wiki/Traffic_shaping. I see great potential in fundamentally changing the cost of services by making services aware of the real second-to-second value of a resource and allowing them to break their resource consumption into classes of urgent (expensive), less urgent (somewhat cheaper), and bulk (near free).
James Hamilton, firstname.lastname@example.org