Author Archive
Here’s a statistic I love, Facebook is running 1,800 MySQL Servers with only 2 DBAs. Impressive. I love seeing services show how far you can go towards admin-free operation. 2:1,800 is respectable and for database servers it downright impressive. This data from a short but interesting report at: http://www.paragon-cs.com/wordpress/?p=144. The Facebook fleet has grown fairly…
Back in March I speculated that Google was soon to announce a third party service platform. Well, on the evening of April 7th, Google Application Engine was announced. It’s been heavily covered over the last couple of weeks and I’ve been waiting to get a beta account so I can write some code against it….
In the Rules of Thumb post, I argued that many of the standard engineering rules the thumb are changing. On a closely related point, Nishant Dani and Vlad Sadovsky both pointed me towards The Landscape of Parallel Computing Research: A View from Berkeley by David Patterson et al. Dave Patterson is best known for foundational…
How to ensure that data written to disk, is REALLY on disk? Yeah, I know, this shouldn’t be hard but the I/O stack is deep, everyone is looking for performance, everyone is caching along the way, so it’s more interesting than you might like. If you writing code that needs to reliable write through semantics…
Wow, the pace is starting to pick up in the service platform world. Google announced their long awaited entrant with Google Application Engine last Monday, April 7th. Amazon announced the SimpleDB to answer the largest requirement they were hearing from AWS customers: persistent, structured storage. Yesterday, another major step was made with Werner Vogles announcing…
The only thing worse than no backups is restoring bad backups. A database guy should get these things right. But, I didn’t, and earlier today I made some major site-wide changes and, as a side effect, this blog was restored to December 4th, 2007. I’m working on recovering the content and will come up with…
I was on a panel at the International Conference on Data Engineering yesterday morning in Cancun, Mexico but I was only there for Friday. You’re probably asking “why would someone fly all the way to Cancun for one lousy day?” Not a great excuse, but it goes like this: the session was originally scheduled for…
What’s commonly referred to as the Great Firewall of China isn’t really a firewall at all. I recently came across an Atlantic Monthly article investigating how the Great Firewall works and what it does (see The Connection has been Reset). The official name of what is often called the Great Firewall of China is the…
The services world is one built upon economies of scale. For example, networking costs for small and medium sized services can run nearly an order of magnitude more than large bandwidth consumers such as Google, Amazon, Microsoft and Yahoo pay. These economies of scale make it possible for services such as Amazon S3 to pass…
A couple of interesting directions brought together: 1) Oracle compatible DB startup, and 2) a cloud-based implementation. The Oracle compatible offering is EnterpriseDB. They use the PostgreSQL code base and implement Oracle compatibility to make it easy for the huge Oracle install base to support them. An interesting approach. I used to lead the SQL…
I’m a big believer in auto-installable client software but I also want a quality user experience. For data intensive applications, I want a caching client. I use and love many of browser-hosted clients but, for development work, email clients, and photo editing, I still use installable software. I want a snappy user experience, I need…
Microsoft has been investigating and testing containers and modular data centers for some time now. I wrote about them some time back in Architecture for Modular Data Centers (presentation) at the 2007 Conference on Innovative Data Research. Around that time Rackable Systems and Sun Microsystems announced shipping container based solutions and Rackable shipped the first…
Tom Kleinpeter was one of the founders of Foldershare (acquired by Microsoft in 2006) and before that he was a part of the original team at Audiogalaxy. I worked with Tom while he was at Microsoft working on Mesh. Tom recently decided to take some time off, to relax, be a father, and it looks…
There is (again) a rumor out there that Google will soon offer a third party service platform: http://www.scripting.com/stories/2008/03/29/pigs.html. I mostly ignore the rumors but this is one I find hard to ignore. Why? Mostly because it makes too much sense. The Google infrastructure investment combined with phenomenal scale yields some of the lowest cost compute…
Yahoo! hosted the Hadoop Summit Tuesday of this week. I posted my rough notes on the conference over the course of the day – posting summarized some of what caught my interest and consolidates my notes. Yahoo expected 100 attendees and ended up having to change venues to get closer to fitting the more than…
HBase: Michael Stack (Powerset) · Distributed DB built on Hadoop core · Modeled on BigTable · Same advantages as BigTable: o Column store § Efficient compression § Support for very wide tables when most columns aren’t looked at together o Nulls stored for free o Cells are versioned (cells addressed by row, col, and timestamp)…
Here’s a statistic I love, Facebook is running 1,800 MySQL Servers with only 2 DBAs. Impressive. I love seeing services show how far you can go towards admin-free operation. 2:1,800 is respectable and for database servers it downright impressive. This data from a short but interesting report at: http://www.paragon-cs.com/wordpress/?p=144. The Facebook fleet has grown fairly…
Back in March I speculated that Google was soon to announce a third party service platform. Well, on the evening of April 7th, Google Application Engine was announced. It’s been heavily covered over the last couple of weeks and I’ve been waiting to get a beta account so I can write some code against it….
In the Rules of Thumb post, I argued that many of the standard engineering rules the thumb are changing. On a closely related point, Nishant Dani and Vlad Sadovsky both pointed me towards The Landscape of Parallel Computing Research: A View from Berkeley by David Patterson et al. Dave Patterson is best known for foundational…
How to ensure that data written to disk, is REALLY on disk? Yeah, I know, this shouldn’t be hard but the I/O stack is deep, everyone is looking for performance, everyone is caching along the way, so it’s more interesting than you might like. If you writing code that needs to reliable write through semantics…
Wow, the pace is starting to pick up in the service platform world. Google announced their long awaited entrant with Google Application Engine last Monday, April 7th. Amazon announced the SimpleDB to answer the largest requirement they were hearing from AWS customers: persistent, structured storage. Yesterday, another major step was made with Werner Vogles announcing…
The only thing worse than no backups is restoring bad backups. A database guy should get these things right. But, I didn’t, and earlier today I made some major site-wide changes and, as a side effect, this blog was restored to December 4th, 2007. I’m working on recovering the content and will come up with…
I was on a panel at the International Conference on Data Engineering yesterday morning in Cancun, Mexico but I was only there for Friday. You’re probably asking “why would someone fly all the way to Cancun for one lousy day?” Not a great excuse, but it goes like this: the session was originally scheduled for…
What’s commonly referred to as the Great Firewall of China isn’t really a firewall at all. I recently came across an Atlantic Monthly article investigating how the Great Firewall works and what it does (see The Connection has been Reset). The official name of what is often called the Great Firewall of China is the…
The services world is one built upon economies of scale. For example, networking costs for small and medium sized services can run nearly an order of magnitude more than large bandwidth consumers such as Google, Amazon, Microsoft and Yahoo pay. These economies of scale make it possible for services such as Amazon S3 to pass…
A couple of interesting directions brought together: 1) Oracle compatible DB startup, and 2) a cloud-based implementation. The Oracle compatible offering is EnterpriseDB. They use the PostgreSQL code base and implement Oracle compatibility to make it easy for the huge Oracle install base to support them. An interesting approach. I used to lead the SQL…
I’m a big believer in auto-installable client software but I also want a quality user experience. For data intensive applications, I want a caching client. I use and love many of browser-hosted clients but, for development work, email clients, and photo editing, I still use installable software. I want a snappy user experience, I need…
Microsoft has been investigating and testing containers and modular data centers for some time now. I wrote about them some time back in Architecture for Modular Data Centers (presentation) at the 2007 Conference on Innovative Data Research. Around that time Rackable Systems and Sun Microsystems announced shipping container based solutions and Rackable shipped the first…
Tom Kleinpeter was one of the founders of Foldershare (acquired by Microsoft in 2006) and before that he was a part of the original team at Audiogalaxy. I worked with Tom while he was at Microsoft working on Mesh. Tom recently decided to take some time off, to relax, be a father, and it looks…
There is (again) a rumor out there that Google will soon offer a third party service platform: http://www.scripting.com/stories/2008/03/29/pigs.html. I mostly ignore the rumors but this is one I find hard to ignore. Why? Mostly because it makes too much sense. The Google infrastructure investment combined with phenomenal scale yields some of the lowest cost compute…
Yahoo! hosted the Hadoop Summit Tuesday of this week. I posted my rough notes on the conference over the course of the day – posting summarized some of what caught my interest and consolidates my notes. Yahoo expected 100 attendees and ended up having to change venues to get closer to fitting the more than…
HBase: Michael Stack (Powerset) · Distributed DB built on Hadoop core · Modeled on BigTable · Same advantages as BigTable: o Column store § Efficient compression § Support for very wide tables when most columns aren’t looked at together o Nulls stored for free o Cells are versioned (cells addressed by row, col, and timestamp)…