Author Archive
Last week the Facebook Data team released Cassandra as open source. Cassandra is an structured store with write ahead logging and indexing. Jeff Hammerbacher, who leads the Facebook Data team described Cassandra as a BigTable data model running on a Dynamo-like infrastructure. Google Code for Cassandra (Apache 2.0 License): http://code.google.com/p/the-cassandra-project/. Avinash Lakshman, Prashant Malik, and…
What follows is a guest posting from Phil Bernstein on the Google Megastore presentation by Jonas Karlsson, Philip Zeyliger at SIGMOD 2008: Megastore is a transactional indexed record manager built by Google on top of BigTable. It is rumored to be the store behind Google AppEngine but this was not confirmed (or denied) at the…
Jim Gray proposed the original sort benchmark back in his famous Anon et al paper A Measure of Transaction Processing Power originally published in Datamation April 1, 1985. TeraSort is one of the benchmarks that Jim evolved from this original proposal. TeraSort is essentially a sequential I/O benchmark and the best way to get lots…
Recently results from two academic researchers in Japan will be significant to the NAND Flash market: http://www.electronicsweekly.com/Articles/Article.aspx?liArticleID=44028&PrinterFriendly=true. Clearly the trip from laboratory to volume production is often longer than the early estimates but these results look important. Back in 2006, Jim Gray argued in Tape is Dead, Disk is Tape, Flash is Disk, & Ram…
Updated below with additional implementation details. Last week Spansion made an interesting announcement: EcoRAM, a NOR Flash based storage part in a Dual In-line Memory Module (DIMM) package. NOR Flash technology growth has been fueled by the NOR support for Execute in Place (XIP). Unlike the NAND Flash interface, where entire memory pages need to…
Title: Needle in a Haystack: Efficient Storage of Billions of Photos Speaker: Jason Sobel, Manager of the Facebook, Infrastructure Group) Slides: http://beta.flowgram.com/f/p.html#2qi3k8eicrfgkv An excellent talk that I really enjoyed. I used to lead a much smaller service that also used a lot of NetApp storage and I recognized many of the problems Jason mentioned. Throughout…
Alex Mallet and Viraj Mody of the Windows Live Mesh team took great notes at the Structure ’08 (Put Cloud Computing to Work) conference (appended below). Some pre-reading information was made available to all attendees as well: Refresh the Net: Why the Internet needs a Makeover? Overall – Interesting mix of attendees from companies in…
John Breslin did an excellent job of writing up Kai-Fu Lee’s Keynote at WWW2008. John’s post: Dr. Kai-Fu Lee (Google) – “Cloud Computing”. There are 235m internet users in China and Kai-Fu believes they want: 1. Accessibility 2. Support for sharing 3. Access data from wherever they are 4. Simplicity 5. Security He argues that…
Earlier today Nokia announced it will acquire the remaining 52% share of the Symbian Limited to take over controlling interest of the mobile operating system provider with 91% of the outstanding shares. This alone is interesting but what is fascinating is they also announced their intention to open source Symbian to create “the most attractive…
Lars Bak leads the Google Aarhus Denmark lab. He’s one of the original developers of Sun HotSpot Java VM. the Self Programming Language, and the sun Connected Limited Device Configuration VM for mobile phone. He’s schedule to do a talk at JAOO Aarhaus, Denmark (Sept. 30, 2008). Unconfirmed rumors report he will be announcing “Google…
Earlier today Google hosted the second Seattle Conference on Scalability. The talk on Chapel was a good description of a parallel language for high performance computing being implemented done at Cray. The GIGA+ talk described a highly scalable filesystem metadata system implemented in Garth Gibson’s lab at CMU. The Google presentation described how they implemented…
Jeff Dean did a great talk at Google IO this year. Some key points from Steve Garrity (msft pm) and some note from the excellent write-up at Google spotlights data center inner workings: · many unreliable servers to fewer high cost servers · Single search query touches 700 to up to 1k machines in <…
I’m interested in high-scale web sites, their architecture and their scaling problems. Last Thursday, Oren Hurvitz posted a great blog entry summarizing two presentations at Java One on the LinkedIn service architecture. LinkedIn scale is respectable: · 22M members · 130M connections · 2M email messages per day · 250k invitations per day No big…
There was an interesting talk earlier today at Microsoft Research by Jason Cong of the UCLA Computer Science Department on compiling design specifications in C/C++/SystemC and user constraints into ASIC and FPGA design. The advantage of compiler based approaches include, more productivity working at a higher level, automating verification, allows optimization, and allows rapid experimentation…
Last week at Google IO, pricing was announced for Google Application Engine. Actually it was blogged the night before at: http://googleappengine.blogspot.com/2008/05/announcing-open-signups-expected.html. The prices are close to identical with Amazon AWS although GAE differs substantially from the AWS offerings. The former offers a easy to use Python execution environment whereas Amazon offers the infinitely flexible run-this-virtual-machine…
Yesterday the Tribute to Honor Jim Gray was held at the University of California at Berkeley. We all miss Jim deeply so it really is a tough topic. But it was great to get together with literally 100s of Jim’s friends and share stories and talk about some of his accomplishments, his contributions to the…
Last week the Facebook Data team released Cassandra as open source. Cassandra is an structured store with write ahead logging and indexing. Jeff Hammerbacher, who leads the Facebook Data team described Cassandra as a BigTable data model running on a Dynamo-like infrastructure. Google Code for Cassandra (Apache 2.0 License): http://code.google.com/p/the-cassandra-project/. Avinash Lakshman, Prashant Malik, and…
What follows is a guest posting from Phil Bernstein on the Google Megastore presentation by Jonas Karlsson, Philip Zeyliger at SIGMOD 2008: Megastore is a transactional indexed record manager built by Google on top of BigTable. It is rumored to be the store behind Google AppEngine but this was not confirmed (or denied) at the…
Jim Gray proposed the original sort benchmark back in his famous Anon et al paper A Measure of Transaction Processing Power originally published in Datamation April 1, 1985. TeraSort is one of the benchmarks that Jim evolved from this original proposal. TeraSort is essentially a sequential I/O benchmark and the best way to get lots…
Recently results from two academic researchers in Japan will be significant to the NAND Flash market: http://www.electronicsweekly.com/Articles/Article.aspx?liArticleID=44028&PrinterFriendly=true. Clearly the trip from laboratory to volume production is often longer than the early estimates but these results look important. Back in 2006, Jim Gray argued in Tape is Dead, Disk is Tape, Flash is Disk, & Ram…
Updated below with additional implementation details. Last week Spansion made an interesting announcement: EcoRAM, a NOR Flash based storage part in a Dual In-line Memory Module (DIMM) package. NOR Flash technology growth has been fueled by the NOR support for Execute in Place (XIP). Unlike the NAND Flash interface, where entire memory pages need to…
Title: Needle in a Haystack: Efficient Storage of Billions of Photos Speaker: Jason Sobel, Manager of the Facebook, Infrastructure Group) Slides: http://beta.flowgram.com/f/p.html#2qi3k8eicrfgkv An excellent talk that I really enjoyed. I used to lead a much smaller service that also used a lot of NetApp storage and I recognized many of the problems Jason mentioned. Throughout…
Alex Mallet and Viraj Mody of the Windows Live Mesh team took great notes at the Structure ’08 (Put Cloud Computing to Work) conference (appended below). Some pre-reading information was made available to all attendees as well: Refresh the Net: Why the Internet needs a Makeover? Overall – Interesting mix of attendees from companies in…
John Breslin did an excellent job of writing up Kai-Fu Lee’s Keynote at WWW2008. John’s post: Dr. Kai-Fu Lee (Google) – “Cloud Computing”. There are 235m internet users in China and Kai-Fu believes they want: 1. Accessibility 2. Support for sharing 3. Access data from wherever they are 4. Simplicity 5. Security He argues that…
Earlier today Nokia announced it will acquire the remaining 52% share of the Symbian Limited to take over controlling interest of the mobile operating system provider with 91% of the outstanding shares. This alone is interesting but what is fascinating is they also announced their intention to open source Symbian to create “the most attractive…
Lars Bak leads the Google Aarhus Denmark lab. He’s one of the original developers of Sun HotSpot Java VM. the Self Programming Language, and the sun Connected Limited Device Configuration VM for mobile phone. He’s schedule to do a talk at JAOO Aarhaus, Denmark (Sept. 30, 2008). Unconfirmed rumors report he will be announcing “Google…
Earlier today Google hosted the second Seattle Conference on Scalability. The talk on Chapel was a good description of a parallel language for high performance computing being implemented done at Cray. The GIGA+ talk described a highly scalable filesystem metadata system implemented in Garth Gibson’s lab at CMU. The Google presentation described how they implemented…
Jeff Dean did a great talk at Google IO this year. Some key points from Steve Garrity (msft pm) and some note from the excellent write-up at Google spotlights data center inner workings: · many unreliable servers to fewer high cost servers · Single search query touches 700 to up to 1k machines in <…
I’m interested in high-scale web sites, their architecture and their scaling problems. Last Thursday, Oren Hurvitz posted a great blog entry summarizing two presentations at Java One on the LinkedIn service architecture. LinkedIn scale is respectable: · 22M members · 130M connections · 2M email messages per day · 250k invitations per day No big…
There was an interesting talk earlier today at Microsoft Research by Jason Cong of the UCLA Computer Science Department on compiling design specifications in C/C++/SystemC and user constraints into ASIC and FPGA design. The advantage of compiler based approaches include, more productivity working at a higher level, automating verification, allows optimization, and allows rapid experimentation…
Last week at Google IO, pricing was announced for Google Application Engine. Actually it was blogged the night before at: http://googleappengine.blogspot.com/2008/05/announcing-open-signups-expected.html. The prices are close to identical with Amazon AWS although GAE differs substantially from the AWS offerings. The former offers a easy to use Python execution environment whereas Amazon offers the infinitely flexible run-this-virtual-machine…
Yesterday the Tribute to Honor Jim Gray was held at the University of California at Berkeley. We all miss Jim deeply so it really is a tough topic. But it was great to get together with literally 100s of Jim’s friends and share stories and talk about some of his accomplishments, his contributions to the…