I got a chance to chat with Eric Baldeschwieler while he was visiting Seattle a couple of weeks back and catch up on what’s happening in the Hadoop world at Yahoo and beyond. Eric recently started Hortonworks whose tag line is “architecting the future of big data.” I’ve known Eric for years when he led the Hadoop team at Yahoo! most recently as VP of Hadoop Engineering. It was Eric’s team at Yahoo that contributed much of the code in Hadoop, Pig, and ZooKeeper.
Many of that same group form the core of Hortonworks whose mission is revolutionize and commoditize the storage and processing of big data via open source. Hortonworks continues to supply Hadoop engineering to Yahoo! And Yahoo! Is a key investor in Hortonworks along with Benchmark Capital. Hortonworks intends to continue to leverage the large Yahoo! development, test, and operations team. Yahoo! has over 1,000 Hadoop users and are running Hadoop over many clusters the largest of which was 4,000 nodes back in 2010. Hortonworks will be providing level 3 support for Yahoo! Engineering.
From Eric slides at the 2011 Hadoop summit, Hortonworks objectives:
• Make Apache Hadoop projects easier to install, manage & use
− Regular sustaining releases
− Compiled code for each project (e.g. RPMs)
− Testing at scale
• Make Apache Hadoop more robust
− Performance gains
− High availability
− Administration & monitoring
• Make Apache Hadoop easier to integrate & extend
− Open APIs for extension & experimentation
Hortonworks Technology Roadmap:
· Phase 1: Making Hadoop Accessible (2011)
o Release the most stable Hadoop version ever
o Release directly usable code via Apache (RPMs, debs,…)
o Frequent sustaining releases off of the stable branches
· Phase 2: Next Generation Apache Hadoop (2012)
o Address key product gaps (Hbase support, HA, Management, …)
o Enable community and partner innovation via modular architecture & open APIs
o Work with community to define integrated stack
Next generation Apache Hadoop:
· Core
o HDFS Federation
o Next Gen MapReduce
o New Write Pipeline (HBase support)
o HA (no SPOF) and Wire compatibility
· Data – HCatalog 0.3
o Pig, Hive, MapReduce and Streaming as clients
o HDFS and HBase as storage systems
o Performance and storage improvements
· Management & Ease of use
o All components fully tested and deployable as a stack
o Stack installation and centralized config management
o REST and GUI for user tasks
Eric’s presentation from Hadoop Summit 2011 where he gave the keynote: Hortonworks: Architecting the Future of Big Data
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com