Author Archive

Facebook Releases Cassandra as Open Source

Last week the Facebook Data team released Cassandra as open source. Cassandra is an structured store with write ahead logging and indexing. Jeff Hammerbacher, who leads the Facebook Data team described Cassandra as a BigTable data model running on a Dynamo-like infrastructure. Google Code for Cassandra (Apache 2.0 License): http://code.google.com/p/the-cassandra-project/. Avinash Lakshman, Prashant Malik, and…

Read more »

Google Megastore

What follows is a guest posting from Phil Bernstein on the Google Megastore presentation by Jonas Karlsson, Philip Zeyliger at SIGMOD 2008: Megastore is a transactional indexed record manager built by Google on top of BigTable. It is rumored to be the store behind Google AppEngine but this was not confirmed (or denied) at the…

Read more »

Hadoop Wins TeraSort

Jim Gray proposed the original sort benchmark back in his famous Anon et al paper A Measure of Transaction Processing Power originally published in Datamation April 1, 1985. TeraSort is one of the benchmarks that Jim evolved from this original proposal. TeraSort is essentially a sequential I/O benchmark and the best way to get lots…

Read more »

Fe-NAND Flash: 10x Durability, 30% Programming Voltage, & 2 Additional Feature Reductions

Recently results from two academic researchers in Japan will be significant to the NAND Flash market: http://www.electronicsweekly.com/Articles/Article.aspx?liArticleID=44028&PrinterFriendly=true. Clearly the trip from laboratory to volume production is often longer than the early estimates but these results look important. Back in 2006, Jim Gray argued in Tape is Dead, Disk is Tape, Flash is Disk, & Ram…

Read more »

EcoRAM: NOR Flash to Reduce Memory Power Consumption

Updated below with additional implementation details. Last week Spansion made an interesting announcement: EcoRAM, a NOR Flash based storage part in a Dual In-line Memory Module (DIMM) package. NOR Flash technology growth has been fueled by the NOR support for Execute in Place (XIP). Unlike the NAND Flash interface, where entire memory pages need to…

Read more »

Facebook: Needle in a Haystack: Efficient Storage of Billions of Photos

Title: Needle in a Haystack: Efficient Storage of Billions of Photos Speaker: Jason Sobel, Manager of the Facebook, Infrastructure Group) Slides: http://beta.flowgram.com/f/p.html#2qi3k8eicrfgkv An excellent talk that I really enjoyed. I used to lead a much smaller service that also used a lot of NetApp storage and I recognized many of the problems Jason mentioned. Throughout…

Read more »

Structure 2008: Put Cloud Computing to Work

Alex Mallet and Viraj Mody of the Windows Live Mesh team took great notes at the Structure ’08 (Put Cloud Computing to Work) conference (appended below). Some pre-reading information was made available to all attendees as well: Refresh the Net: Why the Internet needs a Makeover? Overall – Interesting mix of attendees from companies in…

Read more »

Google’s Dr. Kai-Fu Lee on Cloud Computing

John Breslin did an excellent job of writing up Kai-Fu Lee’s Keynote at WWW2008. John’s post: Dr. Kai-Fu Lee (Google) – “Cloud Computing”. There are 235m internet users in China and Kai-Fu believes they want: 1. Accessibility 2. Support for sharing 3. Access data from wherever they are 4. Simplicity 5. Security He argues that…

Read more »

Nokia to acquire Symbian and go open source

Earlier today Nokia announced it will acquire the remaining 52% share of the Symbian Limited to take over controlling interest of the mobile operating system provider with 91% of the outstanding shares. This alone is interesting but what is fascinating is they also announced their intention to open source Symbian to create “the most attractive…

Read more »

Google Working on Dynamic Runtime?

Lars Bak leads the Google Aarhus Denmark lab. He’s one of the original developers of Sun HotSpot Java VM. the Self Programming Language, and the sun Connected Limited Device Configuration VM for mobile phone. He’s schedule to do a talk at JAOO Aarhaus, Denmark (Sept. 30, 2008). Unconfirmed rumors report he will be announcing “Google…

Read more »

Google Seattle Conference on Scalability

Earlier today Google hosted the second Seattle Conference on Scalability. The talk on Chapel was a good description of a parallel language for high performance computing being implemented done at Cray. The GIGA+ talk described a highly scalable filesystem metadata system implemented in Garth Gibson’s lab at CMU. The Google presentation described how they implemented…

Read more »

Jeff Dean on Google Infrastructure

Jeff Dean on Google Infrastructure

Jeff Dean did a great talk at Google IO this year. Some key points from Steve Garrity (msft pm) and some note from the excellent write-up at Google spotlights data center inner workings: · many unreliable servers to fewer high cost servers · Single search query touches 700 to up to 1k machines in <…

Read more »

Scaling LinkedIn

I’m interested in high-scale web sites, their architecture and their scaling problems. Last Thursday, Oren Hurvitz posted a great blog entry summarizing two presentations at Java One on the LinkedIn service architecture. LinkedIn scale is respectable: · 22M members · 130M connections · 2M email messages per day · 250k invitations per day No big…

Read more »

Platform-Based Electronic System-Level (ESL) Synthesis

There was an interesting talk earlier today at Microsoft Research by Jason Cong of the UCLA Computer Science Department on compiling design specifications in C/C++/SystemC and user constraints into ASIC and FPGA design. The advantage of compiler based approaches include, more productivity working at a higher level, automating verification, allows optimization, and allows rapid experimentation…

Read more »

Google Application Engine Changes

Google Application Engine Changes

Last week at Google IO, pricing was announced for Google Application Engine. Actually it was blogged the night before at: http://googleappengine.blogspot.com/2008/05/announcing-open-signups-expected.html. The prices are close to identical with Amazon AWS although GAE differs substantially from the AWS offerings. The former offers a easy to use Python execution environment whereas Amazon offers the infinitely flexible run-this-virtual-machine…

Read more »

Tribute to Honor Jim Gray

Yesterday the Tribute to Honor Jim Gray was held at the University of California at Berkeley. We all miss Jim deeply so it really is a tough topic. But it was great to get together with literally 100s of Jim’s friends and share stories and talk about some of his accomplishments, his contributions to the…

Read more »