I forget what brought it up but sometime back Sriram Krishnan forwarded me this article on Mike Burrows and his work through Dec, Microsoft, and Google (The Genius: Mike Burrows’ self-effacing journey through Silicon Valley). I enjoyed the read. Mike has done a lot over the years but perhaps his best known works of recent years are Alta Vista at DEC and Chubby at Google.
I first met Mike when he was at Microsoft Research. He and Ted Wobber (also from Digital) came up to Redmond to visit. Back then I led the SQL Server relational engine development team which included the full text search index support. I was convinced then, and still am today, that relational database engines do a good job of managing structured data but a poor job of the other 90 to 95% of the data in the world that is less structured. It just seems nuts to me that customers industry-wide are spending well over $10B a year on relational database management systems and yet only being able to effectively use these systems to manage a tiny fraction of their data. As an increasing fraction of the structured data in the world is already stored in relational database managements systems, industry growth will come from helping customers manage their less structured data.
To be fair, most RDMBS (including SQL Server) do support full text indexing but what I’m after is deep support for full text where the index is a standard access method rather than a separate indexing engine on the side and, more importantly, full statistics are tracked on the full text corpus allowing the optimizer to make high quality decisions on join orders and techniques that include full text indices.
If you haven’t read Mike’s original Chubby paper, do that: http://labs.google.com/papers/chubby.html. Another paper is at: http://labs.google.com/papers/paxos_made_live.html. Chubby is an interesting combination of name server, lease manager, and mini-distributed file system. It’s not the combination of functionality that I would have thought to bring together in a single system but it’s heavily used and well regarded at Google. Unquestionably a success.