Archive For March 25, 2008

Hadoop Summit Notes #4: X-tracing Hadoop & Zookeeper

X-Tracing Hadoop: Andy Konwinski · Berkeley student with the Berkeley RAD Lab · Motivation: Make Hadoop map/reduce jobs easier to understand and debug · Approach: X-trace Hadoop (500 lines of code) · X-trace is a path based tracing framework · Generates an event graph to capture causality of events across a network. · Xtrace collects:…

Read more »

Hadoop Summit Notes #3: JAQL & DryadLINQ

JAQL: A Query Language for Jason · Kevin Beyer from IBM (did the DB2 Xquery implementation) · Why use JSON? o Want complete entities in one place (non-normalized) o Want evolvable schema o Want standards support o Didn’t want a DOC markup language (XML) · Designed for JSON data · Functional query language (few side…

Read more »

Hadoop Summit Notes #2: PIG

PIG: Web-Scale Processing · Christopher Olston · The project originated in Y! Research. · Example data analysis task: Find users that visit “good” web pages. · Christopher points out that joins are hard to write in Hadoop and there are many ways of writing joins and choosing a join technique is actually a problem that…

Read more »

Hadoop Summit

Yahoo is hosting a conference the Hadoop Summit down in Sunnyvale today. There are over 400 attendees of which more than ½ are current Hadoop users and roughly 15 to 20% are running more than 100 node clusters. I’ll post my rough notes from the talks over the course of the day. So far, it’s…

Read more »