James Hamilton's Blog RSS 2.0
 Monday, March 24, 2008

X-Tracing Hadoop: Andy Konwinski

·         Berkeley student with the Berkeley RAD Lab

·         Motivation: Make Hadoop map/reduce jobs easier to understand and debug

·         Approach: X-trace Hadoop (500 lines of code)

·         X-trace is a path based tracing framework

·         Generates an event graph to capture causality of events across a network.

·         Xtrace collects: Report label, trace id, report id, hostname, timestamp, etc.

·         What we get from Xtrace:

o   Deterministic causality and concurrency

o   Control over which events get traced

o   Cross-layer

o   Low overhead (modest sized traces produced)

o   Modest implementation complexity

·         Want real, high scale production data sets. Facebook has been very helpful but Andy is after more data to show the value of the xtrace approach to Hadoop debugging.  Contact andyk@cs.berkeley.edu if you want to contribute data.

 

ZooKeeper: Benjamin Reed (Yahoo Research)

·         Distributed consensus service

·         Observation:

o   Distributed systems need coordination

o   Programmers can’t use locks correctly

o   Message based coordination can be hard to use in some applications

·         Wishes:

o   Simple, robust, good performance

o   Tuned for read dominant workloads

o   Familiar models and interface

o   Wait-free

o   Need to be able to wait efficiently

·         Google uses Locks (Chubby) but we felt this was too complex an approach

·         Design point: start with a file system API model and strip out what is not needed

·         Don’t need:

o   Partial reads & writes

o   Rename

·         What we do need:

o   Ordered updates with strong persistence guarantees

o   Conditional updates

o   Watches for data changes

o   Ephemeral nodes

o   Generated file names (mktemp)

·         Data model:

o   Hiearchical name space

o   Each znode has data and children

o   Data is read and written in its entirety

·         All API take a path (no file handles and no open and close)

·         Quorum based updates with reads from any servers (you may get old data – if you call sync first, the next read will be current as of the point of time when the sync was run at the oldest.  All updates flow through an elected leader (re-elected on failure).

·         Written in Java

·         Started oct/2006.  Prototyped fall 2006.  Initial implementation March 2007.  Open sourced in Nov 2007.

·         A Paxos variant (modified multi-paxos)

·         Zookeeper is a software offering in Yahoo whereas Hadoop

 

Note: Yahoo is planning to start a monthly Hadoop user meeting.

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | blog:http://perspectives.mvdirona.com

 

Monday, March 24, 2008 11:50:52 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services

JAQL: A Query Language for Jason

·         Kevin Beyer from IBM (did the DB2 Xquery implementation)

·         Why use JSON?

o   Want complete entities in one place (non-normalized)

o   Want evolvable schema

o   Want standards support

o   Didn’t want a DOC markup language (XML)

·         Designed for JSON data

·         Functional query language (few side effects)

·         Core operators: iteration, grouping, joining, combining, sorting, projection, constructors (arrays, records, values), unesting, ..

·         Operates on anything that is JSON format or can be transformed to JSON and produces JSON or any format that can be transformed from JSON.

·         Planning to

o   add indexing support   

o   Open source next summer

o   Adding schema and integrity support

 

DryadLINQ: Michael Isard (Msft Research)

·         Implementation performance:

o   Rather than temp between every stage, join them together and stream

o   Makes failure recovery more difficult but it’s a good trade off

·         Join and split can be done with Map/Reduce but ugly to program and hard to avoid performance penalty

·         Dryad is more general than Map/Reduce and addresses the above two issues

o   Implements a uniform state machine for scheduling and fault tolerance

·         LINQ addresses the programming model and makes it more access able

·         Dryad supports changing the resource allocation (number of servers used) dynamically during job execution

·         Generally, Map/Reduce is complex so front-ends are being built to make it easier: e.g. PIG & Sawzall

·         Linq: General purpose data-paralle programming contructs

·         LINQ+C# provides parsing, thype-checking, & is a lazy evaluator

o   It builds an expression tree and materializes data only when requested

·         PLINQ: supports parallelizing LINQ queries over many cores

·         Lots of interest in seeing this code out there in open source and interest in the community to building upon it.  Some comments very positive about how far along the work is matched with more negative comments on this being closed rather than open source available for other to innovate upon.

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | blog:http://perspectives.mvdirona.com

 

Monday, March 24, 2008 11:47:35 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services

PIG: Web-Scale Processing

·         Christopher Olston

·         The project originated in Y! Research.

·         Example data analysis task: Find users that visit “good” web pages.

·         Christopher points out that joins are hard to write in Hadoop and there are many ways of writing joins and choosing a join technique is actually a problem that requires some skill.  Basically the same point made by the DB community years ago.  PIG is a dataflow language that describes what you want to happen logically and then map it to map/reduce.  The language of PIG is called Pig Latin

·         Pig Latin allows the declaration of “views” (late bound queries)

·         Pig Latin is essentially a text form of a data flow graph.  It generates Hadoop Map/Reduce jobs.

o   Operators: filter, foreach … generate, & group

o   Binary operators: join, cogroup (“more customizable type of join”), & union

o   Also support split operator

·         How different from SQL?

o   It’s a sequence of simple steps rather than a declarative expression.  SQL is declarative whereas Pig Latin says what steps you want done in what order.  Much closer to imperative programming and, consequently, they argue it is simpler.

o   They argue that it’s easier to build a set of steps and work with each one at a time and slowly build them up to a complete and correct language.

·         PIG is written as a language processing layer over Map/Reduce

·         He propose writing SQL as a processing layer over PIG but this code isn’t yet written

·         Is PIG+Hadoop a DBMS? (there have been lots of blogs on this question :-))

o   P+H only support sequential scans super efficiently (no indexes or other access methods)

o   P+H operate on any data format (PIGS eat anything) whereas DBMS only run on data that they store

o   P+H is a sequence of steps rather than a sequence of constraints as used in DBMS

o   P+H has custom processing as a “first class object” whereas UDFs were added to DBMSs later

·         They want an Eclipse development environment but don’t have it running yet. Planning an Eclipse Plugin.

·         Team of 10 engineers currently working on it.

·         New version of PIG to come out next week will include “explain” (shows mapping to map/reduce jobs to help debug).

·         Today PIG does joins exactly one way. They are adding more join techniques.  There aren’t explicit stats tracked other than file size.  Next version will allow user to specify. They will explore optimization.

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | blog:http://perspectives.mvdirona.com

 

Monday, March 24, 2008 11:46:32 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services

Yahoo is hosting a conference the Hadoop Summit down in Sunnyvale today. There are over 400 attendees of which more than ½ are current Hadoop users and roughly 15 to 20% are running more than 100 node clusters.

 

I’ll post my rough notes from the talks over the course of the day.  So far, it's excellent. My notes on the first two talks are below.

 

                                    --jrh

 

Hadoop: A Brief History

·         Doug Cutting

·         Started with Nutch in 2002 to 2004

o   Initial goal was web-scale, crawler-based search

o   Distributed by neciessity

o   Sort/merge based processing

o   Demonstrated on 4 nodes over 100M web pages. 

o   Was operational onerous. “Real” Web scale was a ways away yet

·         2004 through 2006: Gestation period

o   GFS & MapReduce papers published (addressed the scale problems we were having)

o   Add DFS and MapReduce to Nutch

o   Two part-time developers over two years

o   Ran on 20 nodes at Internet Archive (IA) and UW

o   Much easier to program and run

o   Scaled to several 100m web pages

·         2006 to 2008: Childhood

o   Y! hired Doug Cutting and a dedicated team to work on it reporting to E14 (Eric  Baldeschwieler)

o   Hadoop project split out of Nutch

o   Hit web scale in 2008

 

Yahoo Grid Team Perspective: Eric Baldeschwieler

·         Grid is Eric’s team internal name

·         Focus:

o   On-demand, shared access to vast pools of resources

o   Support massive parallel execution (2k nodes and roughly 10k processors)

o   Data Intensive Super Computing (DISC)

o   Centrally provisioned and managed

o   Service-oriented, elastic

o   Utility for user and researchers inside Y!

·         Open Source Stack

o   Committed to open source development

o   Y! is Apache Platinum Sponsor

·         Project on Eric’s team:

o   Hadoop:

§  Distributed File System

§  MapReduce Framework

§  Dynamic Cluster Management (HOD)

·         Allows sharing of a Hadoop cluster with 100’s of users at the same time.

·         HOD: Hadoop on Demand. Creates virtual clusters using Torq (open source resource managers).  Allocates cluster into many virtual clusters.

o   PIG

§  Parallel Programming Language and Runtime

o   Zookeeper:

§  High-availability directory and confuration service

o   Simon:

§  Cluster and application monitoring

§  Collects stats from 100’s of clusters in parallel (fairly new so far).  Also will be open sourced.

§  All will eventually be part of Apache

§  Similar to Ganglia but more configurable

§  Builds real time reports.

§  Goal is to use Hadoop to monitor Hadoop.

·         Largest production clusters are currently 2k nodes.  Working on more scaling.  Don’t want to have just one cluster but want to run much bigger clusters. We’re investing heavily in scheduling to handle more concurrent jobs.

·         Using 2 data centers and moving to three soon.

·         Working with Carnegie Mellon University (Yahoo provided a container of 500 systems – it appears to be a Rackable Systems container)

·         We’re running Megawatts of Hadoop

·         Over 400 people express interest in this conference.

o   About ½ the room running Hadoop

o   Just about the same number running over 20 nodes

o   About 15 to 20% running over 100 nodes

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | blog:http://perspectives.mvdirona.com

 

Monday, March 24, 2008 11:45:30 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services

I’m long been a big fan of modular data centers using ISO standard Shipping containers as the component building block:

Containers have revolutionized shipping and are by far the cheapest way to move good over sea, land, rail or truck. I’ve seen them used to house telecommunications equipment, power generators, and even stores and apartments have been made using them: http://www.treehugger.com/files/2005/01/shipping_contai.php.

 

The datacenter-in-a-box approach to datacenter design is beginning to be deployed more widely with Lawrence Berkeley National Lab having taken delivery of a Sun Black Box and a “customer in eastern Washington” having taken delivery of a Rackable Ice Cube Module earlier this year.

 

Last summer I came across a book on Shipping Containers by Marc Levinson: The Box: How the Shipping Container Made the World Smaller and the World Economy Bigger. It’s a history of containers from the early experiments in 1956 through to mega-containers terminals distributed throughout the world. The book doesn’t talk about all the innovative applications of containers outside of shipping but does give an interesting background on their invention, evolution, and standardization.

 

                                                -jrh

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Monday, March 24, 2008 10:42:00 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Ramblings
 Tuesday, March 18, 2008

Earlier today I viewed Steve Jobs 2005 Commencement Speech at Stanford University. In this talk Jobs recounts three stories and ties them together with a common theme.  The first was dropping out of Reed College and showing up for the courses he wanted to take rather than spend time on those he had to take. Dropping out was a tough decision but some of what he learned in these audited courses had a fundamental impact on the Mac and, in retrospect, appeared to be a good decision or at least one that lead to a good outcome.  Getting fired from Apple was the second.  Clearly not his choice, not what he would have wanted to happen but it lead to Pixar, Next and rejoining Apple stronger and more experienced than before. Again, a tough path but one that may have lead to a better overall outcome. Likely he is a better and more capable leader for the experience.  Finally, facing death. Death awaits us all and, when facing death, it becomes clear what really matters.  It becomes clear that following your heart is what is really important. Don’t be trapped by Dogma, don’t live other people’s lives, and have the courage to follow your own intuition. Clearly nobody wants to approach to death but knowing it is coming can free each of us to realize we can’t hide, we don’t have forever, and those things that scare us most are really tiny and insignificant when compared with death. Facing death can free us to take chances and to do what is truly important even if success looks uncertain or the risk is high.

 

The theme that wove these three stories together and Jobs parting words for his listeners was to “stay hungry and stay foolish”.


It’s a good read: http://news-service.stanford.edu/news/2005/june15/jobs-061505.html.  Or you can view it at: http://www.youtube.com/watch?v=D1R-jKKp3NA.

 

Sent my way by Michael Starbird-Valentine.

 

                                    --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H: