A couple of weeks back, a mini-book by Luiz André Barroso and Urs Hölzle of the Google infrastructure team was released. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines is just over 100 pages long but an excellent introduction into very high scale computing and the issues important at scale.
From the Abstract:
As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today’s WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today’s WSCs on a single board.
Some of the points I found particularly interesting:
· Networking:
o Commodity switches in each rack provide a fraction of their bi-section bandwidth for interrack communication through a handful of uplinks to the more costly cluster-level switches. For example, a rack with 40 servers, each with a 1-Gbps port, might have between four and eight 1-Gbps uplinks to the cluster-level switch, corresponding to an oversubscription factor between 5 and 10 for communication across racks. In such a network, programmers must be aware of the relatively scarce cluster-level bandwidth resources and try to exploit rack-level networking locality, complicating software development and possibly impacting resource utilization. Alternatively, one can remove some of the cluster-level networking bottlenecks by spending more money on the interconnect fabric.
· Server Power Usage:
· Buy vs Build:
Traditional IT infrastructure makes heavy use of third-party software components such as databases and system management software, and concentrates on creating software that is specific to the particular business where it adds direct value to the product offering, for example, as business logic on top of application servers and database engines. Large-scale Internet services providers such as Google usually take a different approach in which both application-specific logic and much of the cluster-level infrastructure software is written in-house. Platform-level software does make use of third-party components, but these tend to be open-source code that can be modified inhouse as needed. As a result, more of the entire software stack is under the control of the service developer.
This approach adds significant software development and maintenance work but can provide important benefits in flexibility and cost efficiency. Flexibility is important when critical functionality or performance bugs must be addressed, allowing a quick turn-around time for bug fixes at all levels. It is also extremely advantageous when facing complex system problems because it provides several options for addressing them. For example, an unwanted networking behavior might be very difficult to address at the application level but relatively simple to solve at the RPC library level, or the other way around.
The full paper: http://www.morganclaypool.com/doi/pdf/10.2200/S00193ED1V01Y200905CAC006
James Hamilton, Amazon Web Services
1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | james@amazon.com
H:mvdirona.com | W:mvdirona.com/jrh/work | blog:http://perspectives.mvdirona.com
The alternate style sheet is loaded last in the chain of style sheets on every page in a Publishing site that is displayed within the site master page. You can see this in action by doing a view-source on the page and looking at the CSS includes. The IE Internet Developer Toolbar and Firefox FireBug add-on are also helpful for seeing the styles applied during the cascade.
When working within a SharePoint Publishing site, content editors can enter html using the WYSIWIG editor within a PublishingPageContent Field Control or a Content Editor Web Part. When a site collection is created from the Publishing Portal site definition, the field control and web part are defined with slightly different styles. By merging the style definitions together, you can achieve a unified look in both the field control and web part content entered by your content administrators. This post also describes how you can extend the style definitions so that the WYSIWIG editor applies single-line spacing instead of double-line spacing when an editor hits the carriage-return (Enter) key.
http://www.casinosiminternet.com
One thing that sets Wharehouse-Scale Computer (WSC) apart from normal computer is expectations of reliability and the cost ramfications that it begets. We expect normal computers to be reliable – as far software is concerned memory bus is passing data around intact and on time, and CPUs do not fail at random moments. The folly of this aproach becomes evident when you try to scale it – the cost of a superdome is esentially the square of the number of numa nodes in it.
The situation differs for WSC – the expectation for reliability is that individual node failure is the norm and the network continuously delays, drops or even corrupts the data. In return we get linear cost scaling – twice as many nodes cost you twice as much (or slightly less due to economies of scale). This is the basic logic behind scale-out vs scale-up and taken to its logical conclusion the scale-out is the WSC.
The second observation I want to make is that reliability<->cost tradeoff is actually quite pervasive, at least it should be. For one, the web is built around stateless and thus more fault-tolerant protocols, and this applies not only to failing machines but also goes all the way inside the web server – IIS can restart its own failed processes, and drilling further down in it can restart individual failed app domains inside those processes. Any systems programmer would cirnge at this aproach to achieving overall system stability, yet this, it seems, is where we all headed. While it is possible to design a reliable software system out of reliable software componenets within confines of a single reliable machine, it is not meaningful to do so across the inherently unreliable network. And since we have to design for failure anyways, we might as well give up notion of absolute reliability within single machine as well and treat it as a performance problem that it is. Better yet, when software is robust against failures we should reframe reliability problem as a "cost problem", which is where it ties back to the WSC design.
Lastly, the real logical end of scale-out approach is of course not a single WSC, but a collection of them: CoWSC. As a user I expect to be served within 300ms and will not tolerate larger delays. This means that as a service deisgned I have 300ms budget and I can spend this budget on computng the outcome (inevitably, acorss multiple data sources) or traversing the network hops. The ultimate logical conlusion of this process is that each large company(Microsoft, Amazon, Google and maybe Yahoo) will have a WSC at each network peering point, and the software running on the CoWSC will need to learn to migrate from WSC it was started to WSC near where the user currently resides, or make do with inferior user experience. This has interesting ramificatins for software developers, namely – how do you even program something like this? Even better question – how many developers can program against such system and how does it stack up against the WW demand for application software?
These are the fun times we are living through.
Hey, James, don’t miss Figure 6.3 in the paper. It looks at total costs in a partially full data center. Because of the high data center costs, it might be a good data point for those arguing for containerization in much cheaper facilities (as you have done).