Archive For December 31, 2008

CACM Interview with Pat Selinger

In a previous posting, Pat Selinger IBM Ph.D. Fellowships, I mentioned Pat Selinger as one of the greats of the relational database world. Working with Pat was one of the reasons why leaving IBM back in the mid-90’s was a tough decision for me. In the December 2008 edition of the Communications of the ACM,…

Read more »

Wikipedia Architecture

I’ve long argued that tough constraints often make for a better service and few services are more constrained than Wikipedia where the only source of revenue is user donations. I came across this talk by Domas Mituzas of Wikipedia while reading old posts on Data Center Knowledge. The posting A Look Inside Wikipedia’s Infrastructure includes…

Read more »

MySpace Architecture and .Net

From Viraj Mody of the Microsoft Live Mesh team sent this my way: Dan Farino About MySpace Architecture. MySpace, like Facebook, uses relational DBs extensively front-ended by a layer of Memcached servers. Less open source at MySpace but otherwise unsurprising – a nice scalable design with 3000 front end servers with well over 100 database…

Read more »

Bill Gates

Five or six years ago Bill Gates did a presentation to a small group at Microsoft on his philanthropic work at the Bill and Melinda Gates Foundation. It was by far and away the most compelling talk I had seen in that it was Bill applying his talent to solving world health problems with the…

Read more »

High Efficiency SATA Storage

Related to The Cost of Bulk Storage posting, Mike Neil dropped me a note. He’s built an array based upon this Western Digital part: http://www.wdc.com/en/products/Products.asp?DriveID=336. Its unusually power efficient: Power Dissipation Read/Write 5.4 Watts Idle 2.8 Watts Standby 0.40 Watts Sleep 0.40 Watts And it’s currently only $105: http://www.newegg.com/Product/Product.aspx?Item=N82E16822136151. It’s always been the case that…

Read more »

The Cost of Bulk Cold Storage

The Cost of Bulk Cold Storage

I wrote this blog entry a few weeks ago before my recent job change. It’s a look at the cost of high-scale storage and how it has fallen over the last two years based upon the annual fully burdened cost of power in a data center and industry disk costs trends. The observations made in…

Read more »

Resource Consumption Shaping

Resource Consumption Shaping

Resource Consumption Shaping is an idea that Dave Treadwell and I came up with last year. The core observation is that service resource consumption is cyclical. We typically pay for near peak consumption and yet frequently are consuming far below this peak. For example, network egress is typically charged at the 95th percentile of peak…

Read more »

James Hamilton Joins Amazon.com

I’ve resigned from Microsoft and will join the Amazon Web Services team at the start of next year. As an AWS user, I’ve written thousands of lines of app code against S3, and now I’ll have an opportunity to help improve and expand the AWS suite. In this case, I’m probably guilty of what many…

Read more »

Annual Fully Burdened Cost of Power

Annual Fully Burdened Cost of Power

In the Cost of Power in Large-Scale Data Centers, we looked at where the money goes in a large scale data center. Here I’m taking similar assumptions and computing the Annual Cost of Power including all the infrastructure as well as the utility charge. I define the fully burdened cost of power to be the…

Read more »

Microsoft Generation 4 Modular Data Centers

Michael Manos yesterday published Our Vision for Generation 4 Modular Data Centers – One Way of Getting it Just Right. In this posting, Mike goes through the next generation modular data center designs for Microsoft. Things are moving quickly. I first argued for modular designs in a Conference on Innovative Data Systems paper submitted in…

Read more »

Two Presentations at University of Washington

Ed Lazowska of University of Washington invited me in speak to his CSE 490H class. This is a great class that teaches distributed systems in general and the programming assignments are MapReduce workloads using Hadoop. I covered two major topics, the first on high scale service best practices. How to design, develop, and efficiently operate…

Read more »

New Amazon SimpleDB Pricing

Yesterday, AWS announced new pricing for SimpleDB and its noteworthy: free developer usage for 6 months. No charge for up to 1GB of ingres+egress, 25 machine hours, and 1GB storage. To help you get started with Amazon SimpleDB, we are providing a free usage tier for at least the next six months. Each month, there…

Read more »

Should we Shut Off Servers?

In a comment to the last blog entry, Cost of Power in Large-Scale Data Centers Doug Hellmann brought up a super interesting point It looks like you’ve swapped the “years” values from the Facilities Amortization and Server Amortization lines. The Facilities Amortization line should say 15 years, and Server 3. The month values are correct,…

Read more »