Archive For The “Services” Category
I was on a panel at the International Conference on Data Engineering yesterday morning in Cancun, Mexico but I was only there for Friday. You’re probably asking “why would someone fly all the way to Cancun for one lousy day?” Not a great excuse, but it goes like this: the session was originally scheduled for…
The services world is one built upon economies of scale. For example, networking costs for small and medium sized services can run nearly an order of magnitude more than large bandwidth consumers such as Google, Amazon, Microsoft and Yahoo pay. These economies of scale make it possible for services such as Amazon S3 to pass…
A couple of interesting directions brought together: 1) Oracle compatible DB startup, and 2) a cloud-based implementation. The Oracle compatible offering is EnterpriseDB. They use the PostgreSQL code base and implement Oracle compatibility to make it easy for the huge Oracle install base to support them. An interesting approach. I used to lead the SQL…
I’m a big believer in auto-installable client software but I also want a quality user experience. For data intensive applications, I want a caching client. I use and love many of browser-hosted clients but, for development work, email clients, and photo editing, I still use installable software. I want a snappy user experience, I need…
Microsoft has been investigating and testing containers and modular data centers for some time now. I wrote about them some time back in Architecture for Modular Data Centers (presentation) at the 2007 Conference on Innovative Data Research. Around that time Rackable Systems and Sun Microsystems announced shipping container based solutions and Rackable shipped the first…
Tom Kleinpeter was one of the founders of Foldershare (acquired by Microsoft in 2006) and before that he was a part of the original team at Audiogalaxy. I worked with Tom while he was at Microsoft working on Mesh. Tom recently decided to take some time off, to relax, be a father, and it looks…
There is (again) a rumor out there that Google will soon offer a third party service platform: http://www.scripting.com/stories/2008/03/29/pigs.html. I mostly ignore the rumors but this is one I find hard to ignore. Why? Mostly because it makes too much sense. The Google infrastructure investment combined with phenomenal scale yields some of the lowest cost compute…
Yahoo! hosted the Hadoop Summit Tuesday of this week. I posted my rough notes on the conference over the course of the day – posting summarized some of what caught my interest and consolidates my notes. Yahoo expected 100 attendees and ended up having to change venues to get closer to fitting the more than…
HBase: Michael Stack (Powerset) · Distributed DB built on Hadoop core · Modeled on BigTable · Same advantages as BigTable: o Column store § Efficient compression § Support for very wide tables when most columns aren’t looked at together o Nulls stored for free o Cells are versioned (cells addressed by row, col, and timestamp)…
X-Tracing Hadoop: Andy Konwinski · Berkeley student with the Berkeley RAD Lab · Motivation: Make Hadoop map/reduce jobs easier to understand and debug · Approach: X-trace Hadoop (500 lines of code) · X-trace is a path based tracing framework · Generates an event graph to capture causality of events across a network. · Xtrace collects:…
JAQL: A Query Language for Jason · Kevin Beyer from IBM (did the DB2 Xquery implementation) · Why use JSON? o Want complete entities in one place (non-normalized) o Want evolvable schema o Want standards support o Didn’t want a DOC markup language (XML) · Designed for JSON data · Functional query language (few side…
PIG: Web-Scale Processing · Christopher Olston · The project originated in Y! Research. · Example data analysis task: Find users that visit “good” web pages. · Christopher points out that joins are hard to write in Hadoop and there are many ways of writing joins and choosing a join technique is actually a problem that…
Yahoo is hosting a conference the Hadoop Summit down in Sunnyvale today. There are over 400 attendees of which more than ½ are current Hadoop users and roughly 15 to 20% are running more than 100 node clusters. I’ll post my rough notes from the talks over the course of the day. So far, it’s…
Past experience suggests that disk and memory are the most common server component failures but what about power supplies and mother boards? Amaya Souarez of Global Foundation Services pulled the data on component replacements for the last six months of 2007 and we saw this distribution: 1. Disks: 59.0% 2. Memory: 23.1% 3. Disk Controller:…
The internet was designed in a different time at a different scale. It’s rare that a design continues to work at all when scaled multiple orders of magnitude so it remains impressive but there are issues. The blackholing of YouTube over the weekend showed one of them. Routing is fragile and open to administrative error…
Yesterday, Data Center knowledge reported that Sun was working on a Cloud Platform to compete with Amazon AWS: Project Caroline. The data behind the report comes from a upcoming Java One 2008 presentation by Sun Distinguished Engineer, Bob Scheifler. The talk announcement and synopsis is posted at: http://research.sun.com/projects/caroline/ and, even better, the slides are already…