Author Archive
X-Tracing Hadoop: Andy Konwinski · Berkeley student with the Berkeley RAD Lab · Motivation: Make Hadoop map/reduce jobs easier to understand and debug · Approach: X-trace Hadoop (500 lines of code) · X-trace is a path based tracing framework · Generates an event graph to capture causality of events across a network. · Xtrace collects:…
JAQL: A Query Language for Jason · Kevin Beyer from IBM (did the DB2 Xquery implementation) · Why use JSON? o Want complete entities in one place (non-normalized) o Want evolvable schema o Want standards support o Didn’t want a DOC markup language (XML) · Designed for JSON data · Functional query language (few side…
PIG: Web-Scale Processing · Christopher Olston · The project originated in Y! Research. · Example data analysis task: Find users that visit “good” web pages. · Christopher points out that joins are hard to write in Hadoop and there are many ways of writing joins and choosing a join technique is actually a problem that…
Yahoo is hosting a conference the Hadoop Summit down in Sunnyvale today. There are over 400 attendees of which more than ½ are current Hadoop users and roughly 15 to 20% are running more than 100 node clusters. I’ll post my rough notes from the talks over the course of the day. So far, it’s…
I’m long been a big fan of modular data centers using ISO standard Shipping containers as the component building block: Commodity Datacenter Design: http://research.microsoft.com/~jamesrh/TalksAndPapers/JamesRH_CIDR.doc. Commodity Datacenter Design Slides: http://research.microsoft.com/~jamesrh/TalksAndPapers/JamesRH_Amazon.ppt. Containers have revolutionized shipping and are by far the cheapest way to move good over sea, land, rail or truck. I’ve seen them used to house…
Earlier today I viewed Steve Jobs 2005 Commencement Speech at Stanford University. In this talk Jobs recounts three stories and ties them together with a common theme. The first was dropping out of Reed College and showing up for the courses he wanted to take rather than spend time on those he had to take….
Theo Jansen is a Dutch artist and engineer. His work is truly amazing. What he builds are massive synthetic animals, many more than a floor high, that walk. Their gait is surprisingly realistic and they are wind powered. More than anything they are spooky and yet deeply engaging. Check out a selection of Jansen’s work…
Past experience suggests that disk and memory are the most common server component failures but what about power supplies and mother boards? Amaya Souarez of Global Foundation Services pulled the data on component replacements for the last six months of 2007 and we saw this distribution: 1. Disks: 59.0% 2. Memory: 23.1% 3. Disk Controller:…
Rules of thumb help us understand complex systems at a high level. Examples are that high performance server disks will do roughly 180 IOPS, or that enterprise system administrators can manage roughly 100 systems. These numbers ignore important differences between workloads and therefore can’t be precise, but they serve as a quick check. They ignore,…
When you look at disk transfer rates, it’s pretty obvious that the faster you spin them, the lower the rotational latency and the better the transfer rates. It’s also very clear that disk transfer rates are improving much slower than memory subsystem bandwidth. Why not rotate disks faster? We’ve had 15,000 RPM enterprise disk for…
In past blog entries, I’ve talked of the impact of Flash on server-side systems. On the client, flash SSDs can help with at least two different markets paradoxically at different ends of the cost spectrum: 1) economy low cost laptops, and 2) high performance laptops. At the low end, flash can help make less expensive…
The internet was designed in a different time at a different scale. It’s rare that a design continues to work at all when scaled multiple orders of magnitude so it remains impressive but there are issues. The blackholing of YouTube over the weekend showed one of them. Routing is fragile and open to administrative error…
I get several hundred emails a day, some absolutely vital and needing prompt action and some about the closest thing to corporate spam. I know I’m not alone. I’ve developed my own systems on managing the traffic load and, on different days, have varying degrees of success in sticking to my systems. In my view,…
This isn’t directly related to high scale services or saving power in the data center but it’s a great video. Bill Gates Last Days (6:54) from CES. –jrh James Hamilton, Windows Live Platform Services Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com H:mvdirona.com | W:research.microsoft.com/~jamesrh
Yesterday, Data Center knowledge reported that Sun was working on a Cloud Platform to compete with Amazon AWS: Project Caroline. The data behind the report comes from a upcoming Java One 2008 presentation by Sun Distinguished Engineer, Bob Scheifler. The talk announcement and synopsis is posted at: http://research.sun.com/projects/caroline/ and, even better, the slides are already…
I’ve got nothing against for-fee software – that’s what has paid the bills around our home for more than 20 years. Nonetheless, when it comes to education, it’s hard not to love free. Yesterday Microsoft announced a great program. Universities and high schools can now make use of Microsoft professional development tools for games, cell…
X-Tracing Hadoop: Andy Konwinski · Berkeley student with the Berkeley RAD Lab · Motivation: Make Hadoop map/reduce jobs easier to understand and debug · Approach: X-trace Hadoop (500 lines of code) · X-trace is a path based tracing framework · Generates an event graph to capture causality of events across a network. · Xtrace collects:…
JAQL: A Query Language for Jason · Kevin Beyer from IBM (did the DB2 Xquery implementation) · Why use JSON? o Want complete entities in one place (non-normalized) o Want evolvable schema o Want standards support o Didn’t want a DOC markup language (XML) · Designed for JSON data · Functional query language (few side…
PIG: Web-Scale Processing · Christopher Olston · The project originated in Y! Research. · Example data analysis task: Find users that visit “good” web pages. · Christopher points out that joins are hard to write in Hadoop and there are many ways of writing joins and choosing a join technique is actually a problem that…
Yahoo is hosting a conference the Hadoop Summit down in Sunnyvale today. There are over 400 attendees of which more than ½ are current Hadoop users and roughly 15 to 20% are running more than 100 node clusters. I’ll post my rough notes from the talks over the course of the day. So far, it’s…
I’m long been a big fan of modular data centers using ISO standard Shipping containers as the component building block: Commodity Datacenter Design: http://research.microsoft.com/~jamesrh/TalksAndPapers/JamesRH_CIDR.doc. Commodity Datacenter Design Slides: http://research.microsoft.com/~jamesrh/TalksAndPapers/JamesRH_Amazon.ppt. Containers have revolutionized shipping and are by far the cheapest way to move good over sea, land, rail or truck. I’ve seen them used to house…
Earlier today I viewed Steve Jobs 2005 Commencement Speech at Stanford University. In this talk Jobs recounts three stories and ties them together with a common theme. The first was dropping out of Reed College and showing up for the courses he wanted to take rather than spend time on those he had to take….
Theo Jansen is a Dutch artist and engineer. His work is truly amazing. What he builds are massive synthetic animals, many more than a floor high, that walk. Their gait is surprisingly realistic and they are wind powered. More than anything they are spooky and yet deeply engaging. Check out a selection of Jansen’s work…
Past experience suggests that disk and memory are the most common server component failures but what about power supplies and mother boards? Amaya Souarez of Global Foundation Services pulled the data on component replacements for the last six months of 2007 and we saw this distribution: 1. Disks: 59.0% 2. Memory: 23.1% 3. Disk Controller:…
Rules of thumb help us understand complex systems at a high level. Examples are that high performance server disks will do roughly 180 IOPS, or that enterprise system administrators can manage roughly 100 systems. These numbers ignore important differences between workloads and therefore can’t be precise, but they serve as a quick check. They ignore,…
When you look at disk transfer rates, it’s pretty obvious that the faster you spin them, the lower the rotational latency and the better the transfer rates. It’s also very clear that disk transfer rates are improving much slower than memory subsystem bandwidth. Why not rotate disks faster? We’ve had 15,000 RPM enterprise disk for…
In past blog entries, I’ve talked of the impact of Flash on server-side systems. On the client, flash SSDs can help with at least two different markets paradoxically at different ends of the cost spectrum: 1) economy low cost laptops, and 2) high performance laptops. At the low end, flash can help make less expensive…
The internet was designed in a different time at a different scale. It’s rare that a design continues to work at all when scaled multiple orders of magnitude so it remains impressive but there are issues. The blackholing of YouTube over the weekend showed one of them. Routing is fragile and open to administrative error…
I get several hundred emails a day, some absolutely vital and needing prompt action and some about the closest thing to corporate spam. I know I’m not alone. I’ve developed my own systems on managing the traffic load and, on different days, have varying degrees of success in sticking to my systems. In my view,…
This isn’t directly related to high scale services or saving power in the data center but it’s a great video. Bill Gates Last Days (6:54) from CES. –jrh James Hamilton, Windows Live Platform Services Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com H:mvdirona.com | W:research.microsoft.com/~jamesrh
Yesterday, Data Center knowledge reported that Sun was working on a Cloud Platform to compete with Amazon AWS: Project Caroline. The data behind the report comes from a upcoming Java One 2008 presentation by Sun Distinguished Engineer, Bob Scheifler. The talk announcement and synopsis is posted at: http://research.sun.com/projects/caroline/ and, even better, the slides are already…
I’ve got nothing against for-fee software – that’s what has paid the bills around our home for more than 20 years. Nonetheless, when it comes to education, it’s hard not to love free. Yesterday Microsoft announced a great program. Universities and high schools can now make use of Microsoft professional development tools for games, cell…