Author Archive

1,800 MySQL Servers with Two DBAs

Here’s a statistic I love, Facebook is running 1,800 MySQL Servers with only 2 DBAs. Impressive. I love seeing services show how far you can go towards admin-free operation. 2:1,800 is respectable and for database servers it downright impressive. This data from a short but interesting report at: http://www.paragon-cs.com/wordpress/?p=144. The Facebook fleet has grown fairly…

Read more »

Google Application Engine

Back in March I speculated that Google was soon to announce a third party service platform. Well, on the evening of April 7th, Google Application Engine was announced. It’s been heavily covered over the last couple of weeks and I’ve been waiting to get a beta account so I can write some code against it….

Read more »

The Landscape of Parallel Computing Research: A View from Berkeley

In the Rules of Thumb post, I argued that many of the standard engineering rules the thumb are changing. On a closely related point, Nishant Dani and Vlad Sadovsky both pointed me towards The Landscape of Parallel Computing Research: A View from Berkeley by David Patterson et al. Dave Patterson is best known for foundational…

Read more »

Disks, Lies, and Damn Disks

How to ensure that data written to disk, is REALLY on disk? Yeah, I know, this shouldn’t be hard but the I/O stack is deep, everyone is looking for performance, everyone is caching along the way, so it’s more interesting than you might like. If you writing code that needs to reliable write through semantics…

Read more »

EC2 Gets What It Needed Most

Wow, the pace is starting to pick up in the service platform world. Google announced their long awaited entrant with Google Application Engine last Monday, April 7th. Amazon announced the SimpleDB to answer the largest requirement they were hearing from AWS customers: persistent, structured storage. Yesterday, another major step was made with Werner Vogles announcing…

Read more »

Blog Data Corruption

The only thing worse than no backups is restoring bad backups. A database guy should get these things right. But, I didn’t, and earlier today I made some major site-wide changes and, as a side effect, this blog was restored to December 4th, 2007. I’m working on recovering the content and will come up with…

Read more »

International Conference on Data Engineering 2008

I was on a panel at the International Conference on Data Engineering yesterday morning in Cancun, Mexico but I was only there for Friday. You’re probably asking “why would someone fly all the way to Cancun for one lousy day?” Not a great excuse, but it goes like this: the session was originally scheduled for…

Read more »

Golden Shield Project

What’s commonly referred to as the Great Firewall of China isn’t really a firewall at all. I recently came across an Atlantic Monthly article investigating how the Great Firewall works and what it does (see The Connection has been Reset). The official name of what is often called the Great Firewall of China is the…

Read more »

Diseconomies of Scale

The services world is one built upon economies of scale. For example, networking costs for small and medium sized services can run nearly an order of magnitude more than large bandwidth consumers such as Google, Amazon, Microsoft and Yahoo pay. These economies of scale make it possible for services such as Amazon S3 to pass…

Read more »

A Couple of Interesting Directions Come Together

A couple of interesting directions brought together: 1) Oracle compatible DB startup, and 2) a cloud-based implementation. The Oracle compatible offering is EnterpriseDB. They use the PostgreSQL code base and implement Oracle compatibility to make it easy for the huge Oracle install base to support them. An interesting approach. I used to lead the SQL…

Read more »

Browser-Hosted Software with a “Real” UX

I’m a big believer in auto-installable client software but I also want a quality user experience. For data intensive applications, I want a caching client. I use and love many of browser-hosted clients but, for development work, email clients, and photo editing, I still use installable software. I want a snappy user experience, I need…

Read more »

First Containerized Data Center Announcement

Microsoft has been investigating and testing containers and modular data centers for some time now. I wrote about them some time back in Architecture for Modular Data Centers (presentation) at the 2007 Conference on Innovative Data Research. Around that time Rackable Systems and Sun Microsystems announced shipping container based solutions and Rackable shipped the first…

Read more »

The Audiogalaxy Chronicles

Tom Kleinpeter was one of the founders of Foldershare (acquired by Microsoft in 2006) and before that he was a part of the original team at Audiogalaxy. I worked with Tom while he was at Microsoft working on Mesh. Tom recently decided to take some time off, to relax, be a father, and it looks…

Read more »

Third Party Service Platform From Google?

There is (again) a rumor out there that Google will soon offer a third party service platform: http://www.scripting.com/stories/2008/03/29/pigs.html. I mostly ignore the rumors but this is one I find hard to ignore. Why? Mostly because it makes too much sense. The Google infrastructure investment combined with phenomenal scale yields some of the lowest cost compute…

Read more »

Hadoop Summit Summary

Yahoo! hosted the Hadoop Summit Tuesday of this week. I posted my rough notes on the conference over the course of the day – posting summarized some of what caught my interest and consolidates my notes. Yahoo expected 100 attendees and ended up having to change venues to get closer to fitting the more than…

Read more »

Hadoop Summit Notes #5 (final): HBase, Rapleave, Hive, Autodesk, Computing in the Cloud, & Future Direction Panel

HBase: Michael Stack (Powerset) · Distributed DB built on Hadoop core · Modeled on BigTable · Same advantages as BigTable: o Column store § Efficient compression § Support for very wide tables when most columns aren’t looked at together o Nulls stored for free o Cells are versioned (cells addressed by row, col, and timestamp)…

Read more »