Tuesday, August 16, 2011

I got a chance to chat with Eric Baldeschwieler while he was visiting Seattle a couple of weeks back and catch up on what’s happening in the Hadoop world at Yahoo and beyond. Eric recently started Hortonworks whose tag line is “architecting the future of big data.” I’ve known Eric for years when he led the Hadoop team at Yahoo! most recently as VP of Hadoop Engineering.  It was Eric’s team at Yahoo that contributed much of the code in Hadoop, Pig, and ZooKeeper. 

 

Many of that same group form the core of Hortonworks whose mission is revolutionize and commoditize the storage and processing of big data via open source. Hortonworks continues to supply Hadoop engineering to Yahoo! And Yahoo! Is a key investor in Hortonworks along with Benchmark Capital. Hortonworks intends to continue to leverage the large Yahoo! development, test, and operations team.  Yahoo! has over 1,000 Hadoop users and are running Hadoop over many clusters the largest of which was 4,000 nodes back in 2010. Hortonworks will be providing level 3 support for Yahoo! Engineering.

 

From Eric slides at the 2011 Hadoop summit, Hortonworks objectives:

      Make Apache Hadoop projects easier to install, manage & use

        Regular sustaining releases

        Compiled code for each project (e.g. RPMs)

        Testing at scale

      Make Apache Hadoop more robust

        Performance gains

        High availability

        Administration & monitoring

      Make Apache Hadoop easier to integrate & extend

        Open APIs for extension & experimentation

 

Hortonworks Technology Roadmap:

·         Phase 1: Making Hadoop Accessible (2011)

o   Release the most stable Hadoop version ever

o   Release directly usable code via Apache (RPMs, debs,…)

o   Frequent sustaining releases off of the stable branches

·         Phase 2: Next Generation Apache Hadoop (2012)

o   Address key product gaps (Hbase support, HA, Management, …)

o   Enable community and partner innovation via modular architecture & open APIs

o   Work with community to define integrated stack

 

Next generation Apache Hadoop:

·         Core

o   HDFS Federation

o   Next Gen MapReduce

o   New Write Pipeline (HBase support)

o   HA (no SPOF) and Wire compatibility

·         Data - HCatalog 0.3

o   Pig, Hive, MapReduce and Streaming as clients

o   HDFS and HBase as storage systems

o   Performance and storage improvements

·         Management & Ease of use

o   All components fully tested and deployable as a stack

o   Stack installation and centralized config management

o    REST and GUI for user tasks

 

Eric’s presentation from Hadoop Summit 2011 where he gave the keynote: Hortonworks: Architecting the Future of Big Data

 

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

 

Tuesday, August 16, 2011 4:49:00 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Software
Comments are closed.

Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.

Archive
<August 2011>
SunMonTueWedThuFriSat
31123456
78910111213
14151617181920
21222324252627
28293031123
45678910

Categories
This Blog
Member Login
All Content © 2013, James Hamilton
Theme created by Christoph De Baene / Modified 2007.10.28 by James Hamilton