March 25, 2008 – Perspectives

Hadoop Summit Notes #4: X-tracing Hadoop & Zookeeper

X-Tracing Hadoop: Andy Konwinski · Berkeley student with the Berkeley RAD Lab · Motivation: Make Hadoop map/reduce jobs easier to understand and debug · Approach: X-trace Hadoop (500 lines of code) · X-trace is a path based tracing framework · Generates an event graph to capture causality of events across a network. · Xtrace collects:…

Hadoop Summit Notes #3: JAQL & DryadLINQ

JAQL: A Query Language for Jason · Kevin Beyer from IBM (did the DB2 Xquery implementation) · Why use JSON? o Want complete entities in one place (non-normalized) o Want evolvable schema o Want standards support o Didn’t want a DOC markup language (XML) · Designed for JSON data · Functional query language (few side…

Hadoop Summit Notes #2: PIG

PIG: Web-Scale Processing · Christopher Olston · The project originated in Y! Research. · Example data analysis task: Find users that visit “good” web pages. · Christopher points out that joins are hard to write in Hadoop and there are many ways of writing joins and choosing a join technique is actually a problem that…

Hadoop Summit

Yahoo is hosting a conference the Hadoop Summit down in Sunnyvale today. There are over 400 attendees of which more than ½ are current Hadoop users and roughly 15 to 20% are running more than 100 node clusters. I’ll post my rough notes from the talks over the course of the day. So far, it’s…

Archive For March 25, 2008

Hadoop Summit Notes #4: X-tracing Hadoop & Zookeeper

Hadoop Summit Notes #3: JAQL & DryadLINQ

Hadoop Summit Notes #2: PIG

Hadoop Summit