Archive For The “Software” Category
Earlier today Google hosted the second Seattle Conference on Scalability. The talk on Chapel was a good description of a parallel language for high performance computing being implemented done at Cray. The GIGA+ talk described a highly scalable filesystem metadata system implemented in Garth Gibson’s lab at CMU. The Google presentation described how they implemented…
Wednesday Yahoo announced they have a built a petascale, distributed relational database. In Yahoo Claims Record With Petabyte Database, the details are thin but they built on the PostgreSQL relational database system. In Size matters: Yahoo claims 2-petabyte database is world’s biggest, busiest, the system is described as an over 2 petabyte repository of user…
I’ve been involved with high scale systems software projects, mostly database engines, for the last 20 years and I’ve watched the transition from low level and proprietary languages to C. Then C to C++. Recently I’ve been thinking a bit about what’s next. Back in the very early 90’s when I was Lead Architect on…
I’ve spent a big part of my life working on structured storage engines, first in DB2 and later in SQL Server. And yet, even though I fully understand the value of fully schematized data, I love full text search and view it as a vital access method for all content wherever it’s stored. There are…
It’s not often I come across three interesting notes in the same day but here’s another. Earlier today the Jim Gray Systems Lab was announced and it will be lead by long time database pioneer David DeWitt. This is great to see for a large variety of reasons. First of all it’s wonderful to see…
In the Rules of Thumb post, I argued that many of the standard engineering rules the thumb are changing. On a closely related point, Nishant Dani and Vlad Sadovsky both pointed me towards The Landscape of Parallel Computing Research: A View from Berkeley by David Patterson et al. Dave Patterson is best known for foundational…
How to ensure that data written to disk, is REALLY on disk? Yeah, I know, this shouldn’t be hard but the I/O stack is deep, everyone is looking for performance, everyone is caching along the way, so it’s more interesting than you might like. If you writing code that needs to reliable write through semantics…
The only thing worse than no backups is restoring bad backups. A database guy should get these things right. But, I didn’t, and earlier today I made some major site-wide changes and, as a side effect, this blog was restored to December 4th, 2007. I’m working on recovering the content and will come up with…
I’m a big believer in auto-installable client software but I also want a quality user experience. For data intensive applications, I want a caching client. I use and love many of browser-hosted clients but, for development work, email clients, and photo editing, I still use installable software. I want a snappy user experience, I need…
Rules of thumb help us understand complex systems at a high level. Examples are that high performance server disks will do roughly 180 IOPS, or that enterprise system administrators can manage roughly 100 systems. These numbers ignore important differences between workloads and therefore can’t be precise, but they serve as a quick check. They ignore,…
A few months back I was in a debate about the value of shared code segments between virtual machines. In my view there is no question that shared code across VMs has some value but code is small compared to data so the impact will be visible but not fundamental. What follows is an inventory…
I saw a video earlier today titled “Great Ideas are a Dime a Dozen” and I just loved it. Unfortunately it’s a Microsoft internal-only video so I can’t post it here but I can point to some related talks and videos. The speaker was Bill Buxton of Microsoft research. I fell in love with this…
Dave Dewitt and Michael Stonebraker posted an article worth reading yesterday titled: MapReduce: A Major Step Backwards (Thanks to Kevin Merrit and Sriram Krishnan for sending this one my way). Their general argument is that MapReduce isn’t better than current generation RDBMS which is certainly true in many dimensions and it isn’t a new invention…
Tim O’Reilly of O’Reilly media spoke at Microsoft Research earlier today. It was a great, wide-ranging talk pounding through 103 slides roaming from social networking, through sensor and ambient computing, to Web2.0. Four themes for the talk: · Thoughts on social networking · Sensors and Ambient Computing · Web 2.0 and Wall Street · Open…
It’s finally done! Back in August of 2006 Joe Hellerstein asked me to join him and Mike Stonebraker in producing an article for Foundations and Trends in Database Systems. The project ended up being bigger than I originally understood, and the review process always takes longer than any of us expect. The goal for the…
Michael Hunter, who authors the Testing and Debugging blog at Dr. Dobb’s Journal, asked me for an interview on testing related topics some time back. I’ve long lamented that, industry-wide, there isn’t nearly enough emphasis on test and software quality assurance innovation. For large projects, test is often the least scalable part of the development…