Sunday, June 01, 2008

Yesterday the Tribute to Honor Jim Gray was held at the University of California at Berkeley. We all miss Jim deeply so it really is a tough topic.  But it was great to get together with literally 100s of Jim’s friends and share stories and talk about some of his accomplishments, his contributions to the field, and his contributions to each of us.  Jim is amazing across all three dimensions but what is most remarkable is the profound way he helped others achieve more throughout the industry.  We’re all better engineers, researchers, and human beings for having been lucky enough to have known and worked with Jim.


Also announced yesterday was the creation of the Jim Gray Chair at Cal Berkeley.  Bill Gates Eric Schmidt, Marc Benioff, and Mike Stonebraker each donated $250,000 which were matched by a $1,000,000 from the Hewlett Foundation.


Seattle PI coverage: Gathering in Berkeley, Calif., today to honor legendary scientist, Microsoft researcher Jim Gray.


The morning, general session agenda:

             Welcome - Shankar Sastry

             Opening Remarks - Joseph Hellerstein

             A Tribute, Not a Memorial: Understanding Ambiguous Loss - Pauline Boss

             The Amateur Search - Michael Olson

             Jim Gray at Berkeley - Michael Harrison

             Knowledge and Wisdom - Pat Helland

             Why Did Jim Gray Win the Turing Award? - Michael Stonebraker

             Jim Gray Chair - Stuart Russell

             500 Special Relationships: Jim as a Mentor to Faculty and Students - Ed Lazowska

             Jim Gray: His Contributions to Industry - David Vaskevitch

             A "Gap Bridger" - Richard Rashid

             Thanks to the U.S. Coast Guard - Paula Hawthorn


The afternoon, technical session agenda:

·         Welcome - Shankar Sastry

·         Opening Remarks - Joseph Hellerstein

·         A Tribute, Not a Memorial: Understanding Ambiguous Loss - Pauline Boss

·         The Amateur Search - Michael Olson

·         Jim Gray at Berkeley - Michael Harrison

·         Knowledge and Wisdom - Pat Helland

·         Why Did Jim Gray Win the Turing Award? - Michael Stonebraker

·         Jim Gray Chair - Stuart Russell

·         500 Special Relationships: Jim as a Mentor to Faculty and Students - Ed Lazowska

·         Jim Gray: His Contributions to Industry - David Vaskevitch 

·         A "Gap Bridger" - Richard Rashid

·         Thanks to the U.S. Coast Guard - Paula Hawthorn


The event was video recorded and streamed via


Update: the video will be at:  Tribute to Honor Jim Gray - General Session (thanks to George Spix for sending my way).

Second Update: A good article by John Markoff of the NY Times: A Tribute to Jim Gray: Sometimes Nice Guys Do Finish First.




James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:



Sunday, June 01, 2008 10:08:01 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
 Thursday, May 29, 2008

Google IO notes continued from earlier in the day: and yesterday:


Google Web Toolkit and Client-Server Communications

·         Speaker: Miquel Mendez

·         GWT client/server communication options:

o   Frames

o   Form Panel

o   XHR: RequestBuilder (be careful don’t to start too many—many browsers have limits)


·         XML Encoding/Decoding: defines XML related classes

·         JSON Encoding/Decoding: defines JSON related classes

·         GWT RPC: Generator that generates code and makes use of RequestBuilder


Reusing Google APIs with Google Web Toolkit

·         Speaker: Miquel Mendez

·         GALGWT: Google API Library for GWT.  It’s an open source project lead by Miquel (Javascript bindings to GWT).

o   It’s a collection of easy to use GWT bindings for existing Google JavaScript APIs

o   It’s a Google code open source project

·         Reminder: GWT is a java to Javascript compiler.

·         GWT now has a gadget class.  Google Gadget creation using GWT by extending the Gadget class and implementing the NeedsXXX intrerfaces.

·         Gears support:

o   Exposes database, LocalServer, and WorkerPool JS modules

o   Provides an offline module that automates the process fo going offline (creates the necessary manifests automatically)

·         Google Maps support as well


Engaging User Experiences with Google App Engine

·         Speakers: John Skidgel (designer) & & Lindsay Simon (developer).

·         Showed a guest book application written using Djanjo Form.  It’s been modified to run under App Engine (didn’t say how).

·         App engine development environment makes it easy to work with a designer as it’s easy to install and runs well on a Mac.

·         Walked through what they called 3D (Design, Develop, & Deploy) and how they handle it.

·         Authentication options:

·         Do your own

·         Use GAE (any authenticated user or you can  narrow the population to your domain only – all supported out of the box).

·         Don’t make auth a gating factor or you will lose users – auth at the last possible moment

·         Use the App Engine Datastore for sessions

·         Decreasing Latency:

·         Create build rules to concatenate and minify (Yahoo! Minifier) CSS and JS

·         File fingerprinting

·         Set expires headers for a very long time but add a version ID.  He showed how to handle the version number on the server side.  The recommendation was 10 year expiration with version numbers.

·         Recommends “Progressive Enhancement” or “Defensive Enhancement”.  You should still be able to render without JS. JS should give a better experience but you may not have JS (crappy mobile browsers for example).  Another test, shut off CSS and it should still work.


James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:



Thursday, May 29, 2008 5:55:26 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback

Continued from Yesterday (day 1): Rough notes from Selected Sessions at Google IO Day 1.


Marissa Mayer Keynote: A Glimpse Under the Hood at Google

·         Showed iGoogle and talked about how Google Gadgets are a great way to get broad distribution and are a form of advertising.

·         Search is number 2 most used application (after email)

·         The ordinary and the everyday

·         Why is search page so simple?

·         Variation of Occam’s Razor: “the simple design is probably right”

·         Sergey did it and it was because “there was nobody else to do it and he doesn’t do HTML”

·         Described process of answering a query (700 to 1,000 machines in .16 seconds):

·         This time of day we’re busy so the query will likely go to one data center and likely get bounced to another (must be a simplification of what really happens – load ballancing)

·         Mixer

·         Google Web Server

·         Ads + Websearch (300 to 400 systems)

·         Back to mixer

·         Back to Web server

·         Back to load balancer

·         Split A/B Testing:

·         We given a subset of users a different user experience. Web services allow very detailed views and to iterate very quickly and evolve rapidly.

·         Example: amount of white space under Google logo on results page?

·         This test showed convincingly that less white space rather than more (produces more usage and more revenue)

·         Example: yellow or blue as background for paid adds

·         Yellow produced both more satisfaction and more revenue.

·         “If you don’t listen to your customers, someone else will” – Sam Walton

·         But you need to test rather than ask since they often don’t know.

·         Example: would you like 10, 20, or 30 results. Users unanimously wanted 30.

·         But 10 did way better in A/B testing (30 was 20% worse) due to lower latency of 10 results

·         30 is about twice the latency of 10 (I would have expected the other overheads to dominate.  Suggests there is another solution waiting to be found here).

·         Example: Maps was 120k for launch page.  We took 30 to 40k out.  Got a proportional increase in usage.

·         Example: Google Video uploads used to be 1 day to watch while YouTube offered “Watch it now”.  Much more compelling.

·         Urgent can drown out important

·         Users go from unskilled to skilled searchers very fast (under 1 month).  Consequently it’s better to optimize for expert since most are and novices get there fast due to fast feedback loop.

·         The lesson is to think longer term at all levels in design.

·         Think beyond the current development horizon.  10 years for major products and services.

·         Example: Universal search vs vertical search.  Users want verticals now but what they really want is universal search.  They just want to find the answer they are searching for.

·         Goog-411: don’t know if we can make money off this but it helps us develop voice recognition. Applications of voice recognition are monetizable so, even if Goog-411 doesn’t yield revenue, other applications will.

·         International content:

·         50% of the web is English but only aobut 1% of the web is Arabic

·         Conclusion: take an Arabic search, translate find relevant pages, then translate the result.  Opens up MUCH more content and dramatically improves the results for an Arabic user.

·         Larry Paige: ”A Healthy Disrespect for the Impossible” opens up many possibilities.

·         Showed examples of how search is not generally “solvable” but getting to 90 to 95% has HUGE benefit. Search is a hard and unconstrained problem.  Same with health records.

·         Recommendation: Be Scrappy & revel in constraints

·         Google operates in 140 countries and 110 languages.  Described the complexity of pulling out text strings from a web site, sending out to translation, dealing with multiple string versions, etc.

·         Betters solution: let the users help with the translated content.  If you don’t see your language, help us do it.  There are now ¼ million users helping with translation from all over the world.

·         Interesting little Easter egg:  one of the languages on the Google home page is “Bork! Bork! Bork!” – it’s the Swedish chef from the Muppets

·         Interesting little example: they took 11k Googler’s to Indiana Jones last week

·         Marissa went through a bunch of examples of taking on the impossible and brainstorming possible solutions and showing that some just exercised their thinking and others produced cool products/solutions.  Explained that 20% time is just another way of exercising the brain (“Imagination as a muscle”).  And Orkut, Google News, and during one period 50% of their new products, were from 20% time.

·         Random note: What you last searched for is the best context signal for the current search.


James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:



Thursday, May 29, 2008 8:59:35 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
 Wednesday, May 28, 2008

Rough notes from the sessions I attended at Google IO.  The sessions are going to be available in Video so, if you want more detail (or more accuracy :-)), you can check out the videos.


Vic Gundotra Keynote:

·         2 hour session walking through entire conference material mostly with demos: Open Social, Google Web Toolkit, Android, Gears,

·         8 Main Conference Tracks with multiple concurrent sessions in each:

o   AJAX & Javascript

o   APIs & Tools

o   Maps & Geo

o   Mobile

o   Social

o   Code Labs

o   Tech Talks

o   Fireside Chat

·         All recorded and will be available publically.


OpenSocial: A Standard for the Social Web

·         How do we socialize objects online without creating yet another social network (there are already at least 100)?

·         API for controlled exchange of Friends, Profiles, and Activities (the update stream)

·         Recommends Hal Varian’s (Google Chief Economist) ”Information Rules”

o   OpenSoicial is an implementation of Chapter 8

·         Association of Google, MySpace, and Yahoo!


·         More than 275M users of OpenSocial

·         How to build an OpenSocial application?

o   JavaScript Version 0.7 now and REST services coming soon

o   Three groupings of the API

§  People & friends

§  Activities

§  Persistence

o   Programming model is async. Send a request and set a callback function that gets called on completion.

o   Update of activity field: postActivity(text) – also supports setting priority

o   Example server side REST services:

§  /people/{guid}/@all: collection of all people connected to the user

§  /peple/{guid}/@friends: friends

·         Main sell is to allow small sites to gain critical mass when friction of yet another login system and initial lack of users would have blocked.  Make it easier on users.

·         Showed a map of the world showing that different social networks have won in different geographies all over the world.

o   E.g. LiveJournal (Rusia), Orkut (Brazil)

·         OpenSocial gets you to all their users so plan to localize your application (OpenSocial is designed to support localication)

·         OpenSocial Terms:

o   Container:  the site (Hi5, MySpace, etc.)

o   Owner: author/owner of the page

o   Viewer: person viewing the page

·         Apache Shindig is an open source implementation with a goal of allowing new sites to host open social applications in well under an hour.

·         Shindig is an Apache Incubator project:

·         Summary: make the web more social, current version is 0.7, and 0.8 includes REST.

·         OpenSocial has 11 sessions in addition to this one at Google IO.


Google App Engine

·         This session packed.  Others quite lightly filled.

·         Google App Engine does one thing well

o   App engine handles HTTP requests, nothing else

o   Resources are scaling automatically

o   Highly scalable store based on BigTable

·         An application is a directory with everything underneath it

·         Single file app.yaml in app root directory

o   Defines app metadata

o   Maps URL patterns in regex to request handlers

o   Seperates static files from program fiels

·         Dev Server (SDK) emulates deployment environment

·         Request Handlers:

o   Python script invoked as though it were a CGI script

o   Environment variables give request parameters




o   Write response to stdout

·         Runtime is Python only but the fact that it is specified in app.yaml suggests that more will eventually be added.

·         Showed Django support and how to use GAE with Django

o   Showed a minimal

§  Import os from google.appengine.ext.webapp import util, ….

o   Also showed minimal

·         Note: Existing Django apps will NOT port easily to GAE.


Google Docs + Gears == Google Docs Offline

·         Google Docs Offline Architecture:

o   Document editor

o   Spreadsheet editor

o   Presentation editor

o   Authentication

o   Docs Home (doclist)

·         Overall, no big breakthroughs.  It’s just Docs offline but its work well done.

·         Challenges to disconnected operation:

o   Upgrade is a challenge: Now that code is being installed remotely, the server needs to support old code at least until the new code is pushed out and installed.

·         Possible solutions for static resources: fail to upgrade, sticky sessions, resource database, or serve the old version.

·         Solution implemented: resource database with a per-server cache

o   Rolling upgrade for HTML: hard code the offlineVersion and request it specifically – it will fail during rolling server upgrades but the speaker argued that it wasn’t worth the cost to avoid this failure.

o   Security: Decided to not do auth remotely and rely on O/S facilities (if you have access to the data at the O/S level, you get access). But they do provide support for multiple users since most power users have multiple personas (work and home at least).  Multi-user support is via putting the email address of the user in a cookie.  They have an loggedin and a loggedout manifest.  The loggedout manifest redirects to a dialog to chose one of your existing accounts. This either sets the loggedout cookie to an appropriate email address or fails. (loggedin cookie doesn’t have an email address – it has the google security context).

·         Recommendations:

o   Need to provide debugging tools (online can look at server logs – need something for online)

o   Rollout initially a small number

o   Support disabling offline experience for a user


Under the Covers of the App Engine Datastore

·         Speaker: Ryan Barrett: App engine Data Store Lead

·         Bigtable in one slide:

o   Scalable structured store

o   Types on each value

o   Single row transactions

o   Two types of scans: 1) prefix (physically contiguous), 2) range scan (also physically contiguous)

·         The entities table:

o   Primary GAE table

o   Stores all entities in all apps

o   Generic and schemaless

o   Row name is entity key

o   Only column is serialized entity

·         Entity key is based on parent entities (root to child, to child, etc.)

o   Note: Can’t change a primary key but can delete and create a new entity with new key

·         Queries and indexes:

·         GQL: Google Query Language

o   A tiny subset of SQL.  Most clauses restricted. Added the Ancestor clause.

·         Big table only supports scan.  No sorting and no filtering.

o   Because they have no knowledge of the app or data shape, they convert all queries to scans since that is all BigTable can do.

o   Indexes:

§  Kind Index (kind, key) where kind is child, grandparent, parent, …

§  Single-property index (kind, name, value key) : Serves queries on a single property. (there are two indexes: ascending and descending)

§  Composite index: defined by the user in index.yaml (generated by the dev environment if you run queries over all needed composite types).

o   All index comparisons are lexicographic

o   They support index intersection.  Multiple equals filters and an equals filter and an ancestor restriction for example (just do index anding).

·         Indexes space consumption is not charged for since they don’t want to make people go to considerable pain to avoid using, for example, composite indexes.  Ryan went on to explain that this is what he “wants” but it is not a committed decision.

·         If a query can’t be satisfied with a range scan, they query will be failed (need index exception).

·         Transaction model: all writes are transactional

o   All writes are written to journal with timestamp

o   No locking – they use optimistic concurrency control.

o   Each entity has a last committed time.  All reads access last committed time.  All writes check to ensure last committed hasn’t changed. The committed timestamp is only updated after the full value is written out and the log entry is written. The log entry is a big table row and each row supports atomic writes.  He didn’t provide enough detail to fully debug/understand the commit protocol implementation.

o   You define entity groups (defined by the root entity – all descendents are in the same entity group.  Only the root has the timestamp.

o   He did say that all writes to a entity group are serialized so make the entity groups small.


Working with Google App Engine Models:

·         Speaker: Rafe Kaplan

·         Other object relational mapping systems:

o   ActiveRecord

o   Django

o   Hibernate

·         Does not map to an RDBMS

·         No pre-existing schema

·         No joins, No Aggs, & no functions

·         Showed how to model relationships


James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Wednesday, May 28, 2008 5:14:59 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
 Friday, May 23, 2008

Wednesday Yahoo announced they have a built a petascale, distributed relational database.  In Yahoo Claims Record With Petabyte Database, the details are thin but they built on the PostgreSQL relational database system. In Size matters: Yahoo claims 2-petabyte database is world's biggest, busiest, the system is described as an over 2 petabyte repository of user click stream and context data with an update rate for 24 billion events per day.  Waqar Hasan, VP of Engineering at Yahoo! Data group, describes the system as updated in real time and live – essentially a real time data warehouse where changes go in as they are made and queries always run against the most current data. I strongly suspect they are bulk parsing logs and the data is being pushed into the system in large bulk units but, even near real time at this update rate, is impressive.


The original work was done at a Seattle startup called Mahat Technologies acquired by Yahoo! in November 2005.


The approach appears to be similar to what we did with IBM DB2 Parallel Edition.  13 years ago we had it running on a cluster of 512 RS/6000s at the Maui Super Computer Center and 256 nodes at the Cornel Theory Center.  It’s a shared nothing design which means that each server in the cluster have independent disk and don’t share memory. The upside of this approach is it scales incredibly well. It looks like Yahoo! has done something similar using PostgreSQL as the base technology.  Each node in the cluster runs a full copy of the storage engine.  The query execution engine is replaced with one modified to run over a cluster and use a communications fabric to interconnect the nodes in the cluster.  The parallel query plans are run over the entire cluster with the plan nodes interconnected by the communication fabric.  The PostgreSQL client, communications protocol and server side components with some big exceptions run mostly unchanged.  The query optimizer is either replaced completely with a cluster parallel aware implementation that models the data layout and cluster topology in making optimization decisions.  Or the original, non-cluster parallel optimizer is used and the resultant single node plans are then optimized for the cluster in a post optimization phase. The former will yield provably better plans but it’s also more complex. I’m fearful of complexity around optimizers and, as a consequence, I actually prefer the slightly less optimal, post-optimization phase.  Many other problems have to be addressed including having the cluster metadata available on each node to support SQL query compilation but what I’ve sketched here covers the major points required to get such a design running.


The result is a modified version of PostgreSQL runs on each node.  A client can connect to any of the nodes in the cluster (or a policy restricted subset).  A query flows from the client to the server it chose to connect with. The SQL compiler on that node compiles and optimizes the query on that single node (no parallelism). The query optimizer is either cluster-aware or uses a post-optimization cluster-aware component.  The resultant query plan when ready for execution is divided up into sub-plans (plan fragments) that run on each node connected over the communication fabric.  Some execution engines initiate top-down and some bottom up. I don’t recall what PostgreSQL uses but bottom-up is easier in this case.  However, either can be made to work.  The plan fragments are distributed to the appropriate nodes in the cluster.  Each runs on local data and pipes results to other nodes which run plan fragments and forward the results yet again toward the root of the plan. The root of the plan runs on the node that started the compilation and the final results end up there to be returned to the client.


It’s a nice approach and as evidenced by Yahoo’s experience it scales, scales, scales.  I also like the approach in that most tools and applications can continue to work with little change.  Most clusters of this design have some restrictions such unique ID generation is either not supported or slow as is referential integrity.  Nonetheless, a large class of software can be run without change.


If you are interested in digging deeper into Relational Database technology and how the major commercial systems are written, see Architecture of a Database System.


Yahoo has a long history of contributing to Open Source and they are the largest contributor to the Apache Hadoop project. It’ll be interesting to see if Yahoo! Data ends up open source or held as an internal only asset.


Kevin Merritt pointed me to the Yahoo! Data work.




James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:

Friday, May 23, 2008 6:22:38 AM (Pacific Standard Time, UTC-08:00)  #    Comments [4] - Trackback
 Wednesday, May 21, 2008

Search drives the online commerce world by bringing sellers and buyers together.  As a seller, you most important task is getting your site to rank high organically and to have your advertisements placed most prominently and most frequently to user interested in buying and only to users interested in your product.  A buyer chooses a search engine on the basis of more reliably getting them to what they are looking for.  And, with commercial queries, getting them to the “best” seller where best is a fairly complex and hard to define term in this context.  Happy buyers keep using the search engine and paying the sellers.  Sellers who manage their organic and paid placements correctly sell lots of product.  Successful search engines make considerable profit.  That’s just the way the ecosystem has evolved – it’s the broadly used search engine that has all the influence and so they end up with considerable profit.


What if the rules changed?  What if some of the search engine profit was returned to users?  Could this change the ecosystem and could it be a good thing?  Let’s watch because Microsoft is about to announce a “cash back service” later today according to Search Engine Land.  In this posting, Playing with Live Cashback, the blog author demonstrates using the Live Cashback system and concludes that it won’t have much impact.  I’m less certain.  I suspect that respecting users and returning some value to them will change this market in positive way. It’ll be fun to watch over the next 4 to 6 weeks and see how the search ecosystem evolves.




James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:



Wednesday, May 21, 2008 8:35:27 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2] - Trackback
 Tuesday, May 20, 2008

There is no question that cloud computing is going to a big part of the future of server-side systems. What I find interesting is the speed with which this is happening.  Look at recent network traffic growth rates from AWS:




AWS is now consuming considerably more bandwidth than Amazon’s global web sites.  Phenomenal growth and impressive absolute size.


Continuing to look at growth, I saw a chart a few weeks back on the Amazon Web Services Blog that illustrates the value of a pay-as-you-go and pay-as-your-grow service.  This chart shows the number of EC2 servers in use by Animoto over a couple of week period. Note the explosion in EC2 server usage in the three day period from 4/15 through 4/18 and imagine trying to do capacity planning for Animoto.  They went from roughly 50 servers to needing more than 3,500 in three days. Imagine having to predict growth and get servers racked, stacked and online in time to meet the growth.  Nearly impossible.

From: (Emre Kiciman sent it my way).


When you next hear “why web services?”, think of this chart.


Another point I hear frequently around web services is, “sure, they are used by start-ups but REAL enterprises would never use them due to security and data privacy reasons.”  Again, utter bunk but it’s a frequently repeated quip. I led the Exchange Hosted Services team and we provided hosted email anti-malware and archiving.   The service was originally targeting small and medium sized businesses and many from those categories did use it. But, what was interesting was the number of name-brand, world-wide enterprises that recognized the cost and quality advantages of using hosting services.  Valuable internal enterprise resources are best saved for tasks that add value to the business.  


Perhaps the large enterprises will use hosted email services but what about low level services such as EC2 and S3?  Again, it’s the same story.  If the value is there, companies of all sizes will use it.  From the Amazon 4th quarters earnings call, TechCrunch reports (Who Are The Biggest Users of Amazon Web Services? It’s Not Startups):


So who are using these services? A high-ranking Amazon executive told me there are 60,000 different customers across the various Amazon Web Services, and most of them are not the startups that are normally associated with on-demand computing. Rather the biggest customers in both number and amount of computing resources consumed are divisions of banks, pharmaceuticals companies and other large corporations who try AWS once for a temporary project, and then get hooked.


Big companies are jumping in as well.


Google recently entered the cloud computing market with Google Application Engine. They are only a couple months in beta and report they have allowed in 60,000 developers in that short period of time.  The amazing thing is the apparent size of the back log. The forums are full  of people complaining that they can’t yet get on (Sriram Krishnan sent my way).


Wired recently published “Cloud Computing. Available at Today”.


It’s unusual for a new model to grow so fast and it’s close to unprecedented to see so much early growth in the enterprise.  However, when the potential savings are this large, big things can happen.




James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:

Tuesday, May 20, 2008 7:30:11 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2] - Trackback
 Saturday, May 17, 2008

I’ve been involved with high scale systems software projects, mostly database engines, for the last 20 years and I’ve watched the transition from low level and proprietary languages to C. Then C to C++. Recently I’ve been thinking a bit about what’s next.


Back in the very early 90’s when I was Lead Architect on IBM DB2, I was dead against C++ usage in the Storage Engine and wouldn’t allow exceptions to be used anywhere in the system. At the time, the quality of C++ compilers was variable with some being real compilers that were actually fairly well done (I lead the IBM RS/6000 C++ team in the late 80s) while others were Cfront-based and pretty weak.  At the time no compiler, including the one I worked on, did a good job implementing exceptions.  Times change.  SQL Server, for example, is 100% C++ and it makes excellent use of exception to clean up resources on failure. 


The productivity benefits of new programming languages and tools eventually wins out.  When they get broad use, implementations improve reducing the performance tax and, eventually, even very performance sensitive system software make the transition.


I got interested in Java in the mid-90’s and more recently I’ve been using C# quite a bit partly due to where I work and partly because I actually find the language and surrounding tools impressively good.  JITed languages typically don’t perform as well as statically compiled languages but the advantages completely swamp the minor performance costs.  And, as managed language (Java, C#, etc.) implementations improve, the performance tax continues to fall. There is no question in my mind that managed languages will end up being broadly used in even the most performance critical software systems such as database engines.


Recently, I’ve gotten interested in Erlang as an systems software implementation language.  By most measures, it looks to be an unlikely choice for high scale system software in that its interpreted, has a functional subset at its core, and uses message passing rather than shared memory and locks. Basically, it’s just about the opposite of everything you would find in a modern commercial database kernel.  So what makes it interesting? The short answer is all the things that make it an unlikely choice also make it interesting.  Servers are becoming increasingly unbalanced with CPU speeds continuing to outpace memory and network bandwidth.  More and more operations are going to be memory and network bound rather than CPU if they aren’t already.  Trading some CPU resources to get a more robust implementation that is easier to understand and maintain is a good choice.  In addition, CPU speed increases are now coming more from multiple cores than from frequency scaling a single core. Consequently a language that produces an abundance of parallelism is a an asset rather than a problem. Finally, large systems software projects like database management systems, operating systems, web servers, IM servers, email systems, etc. are incredibly large and complex. The Erlang model of spawning many lightweight threads that communicate via message passing is going to be less efficient than the more common shared memory and locks solution but it’s much easier to get correct.  Erlang also encourages a “fail fast” programming model.  I’ve long argued that this is the only way to get high scale systems software correct (Designing and Deploying Internet-Scale Services). 


Certainly Erlang brings a tax as have other new languages that we have adopted over the years. But, it also bring some of what we need badly right now.  For example, the fail fast programming model is the right one and, when combined with synchronous state redundancy, is how most high-scale systems should be written.  Erlang also encourages the production of a very large number of threads which can be a good thing on very high core count servers.  Message passing rather than shared memory with locks and fail fast with operation restart significantly increases the probability of the software system working correctly through unexpected events.


From my perspective, the syntax of Erlang is less than beautiful but all the advantages above make up for most of that.


The Concurrency and Coordination Runtime is a .Net runtime that implements some of the features I mention above for languages like C#.  George Chrysanthakopoulos, Microsoft CCR Architect, reports that MySpace is using it: using the CCR (Sriram Krishnan pointed me to this one).


It appears that Erlang usage is ramping up fairly quickly right now.  Naturally, since it was developed there,  Erlang is used by many Ericsson projects including the AXD301 ATM Switch and the AXE line of switches.  The AXD series includes over 850k lines of Erlang.  However, outside of Ericsson some very interesting examples are emerging.  Amazon’s SimpleDB is written is Erlang (Amazon SimpleDB is built on Erlang and What You Need To Know About Amazon SimpleDB). The recently released (quietly) Facebook Chat application uses Erlang as well (Dare Obasanjo sent that one my way).  CouchDB is written Erlang as well (CouchDB: Thinking beyond the RDBMS).  Some more Erlang applications from the Erlang FAQ:

Is it time for a new server-side implementation language?




James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Saturday, May 17, 2008 11:16:11 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services | Software
 Monday, May 12, 2008

I’ve spent a big part of my life working on structured storage engines,  first in DB2 and later in SQL Server.  And yet, even though I fully understand the value of fully schematized data, I love full text search and view it as a vital access method for all content wherever it’s stored.   There are two drivers of this opinion: 1) I believe, as an industry, we’re about ¼ of the way into a transition from primarily navigational access patterns to personal data to ones based upon full text search, and 2) getting agreement on broad, standardizing schema across diverse user and application populations is very difficult. 


On the first point, for most content on the web, full text search is the only practical way to find it.  Navigational access is available but it’s just not practical for most content.  There is simply too much data and there is no agreement on schema so more structured searches are usually not possible.  Basically structured search is often not supported and navigational access doesn’t scale to large bodies of information.  Full text search is often the only alternative and it’s the norm when looking for something on the web. 


Let’s look at email.   Small amounts of email can be managed by placing each piece of email you chose to store in a specific folder so it can be found later navigationally.  This works fine but only if we keep only a small portion of the email we get.  If we never bothered to throw out email or other documents that we come across, the time required to folderize would be enormous and unaffordable. Folderization just doesn’t scale.  When you start to store large amount of email or just stop (wasting time) aggressively deleting email, then the only practical way to find most content is full text search.  As soon as 5 to 10GB of un-folderized and un-categorized personal content is accumulated, it’s the web scenario all over again: search is the only practical alternative.  I understand that this scenario is not supported or encouraged by IT or legal organizations at most companies but that is the way I chose to work.  There is no technical stumbling block to providing unbounded corporate email stores and the financial ones really don’t stand up to scrutiny. Ironically most expensive, corporate email systems offer only tiny storage quotas while most free, consumer-based services are effectively unbounded.  Eventually all companies will wake up to the fact that knowledge workers work more efficiently with all available data.  And, when that happens, even corporate email stores will grow beyond the point of practical folderization.


The second issue was the difficulty of standardizing schema across many different stores and many different applications.  The entire industry has wanted to do this over the past couple of decades and many projects have attempted to make progress.  If they were widely successful, it would be wonderful but they haven’t been.  If we had standardized schema, we would have quick and accurate access to all data across all participating applications.  But it’s very hard to get all content owners to cooperate or even care.  Search engines attempt to get to the same goal but they chose a more practical approach: they use full text search and just chip away at the problem.  They work hard on ranking. They infer structure in the content where possible and exploit it where it’s found.   Where structure can’t be found, at least there is full text search with reasonably good ranking to full back upon.


Strong or dominant search engine providers have considerable influence over content owners and weak forms of schema standardization becomes more practical.  For example, a dominate search engine provider can offer content owners opportunities to get better search results for their web site if they supply a web site map (standard schema showing all web pages in site).  This is already happening and web administrators are participating because it brings them value.  A web sites ranking in the important search engine providers is very vital and a chance to lift your ranking even slightly is worth a fortune.  Folks will work really hard where they have something to gain.  So, if adopting common schema can improve ranking, there is significant chance something positive actually could happen. 


The combination of providing full text search over all content and then motivating content providers to participate in full or partial schema standardization coupled with the search engine inferring schema where it’s not feels like a practical approach to richer search.  I love full text search and view it as the under-pinning to finding all information structured or not.  The most common queries will include both structured and non-structured components but the common element will be that full schema standardization isn’t required nor is it required that a user understand schema to be able to find what they need.  Over time, I think we will see incremental participation in standardized schemas but this will happen slowly.  Full text search with good ranking and relevance assisted by whatever schema can be found or inferred in the data will be the under-pinning to finding most content over the near term.




James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:



Monday, May 12, 2008 4:42:14 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2] - Trackback
 Wednesday, May 07, 2008

Some time back I got a question on what I look for when hiring a Program Manager from the leader of a 5 to 10 person startup.  I make no promise that what I look for is typical of what others look for – it almost certainly is not.  However, when I’m leading an engineering team and interviewing for a Program Manager role, these are the attribute I look for.  My response to the original query is below:


The good news is that you’re the CEO not me.  But, were our roles reversed, I would be asking you why you think you need  PM at this point?  A PM is responsible for making things work across groups and teams.  Essentially they are the grease that helps make a big company be able to ship products that work together and get them delivered through a complicated web of dependencies.  Does a single product startup in the pre-beta phase actually need PM?  Given my choice, I would always go with more great developer at this phase of the companies life and have the developers have more design ownership, spend more time with customers, etc.  I love the "many hats" model and it's one of the advantages of a start-up. With a bunch of smart engineers wearing as many hats as needed, you can go with less overhead and fewer fixed roles, and operate more efficiently. The PM role is super important but it’s not the first role I would staff in a early-stage startup.


But, you were asking for what I look for in a PM rather than advice on whether you should look to fill the role at this point in the company’s life.  I don't believe in non-technical PMs, so what I look for in PM is similar to what I look for in a developer.  I'm slightly more willing to put up with somewhat rusty code in a PM, but that's not a huge difference.  With a developer, I'm more willing to put up with certain types of minor skill deficits in certain areas if they are excellent at writing code.  For example, a talented developer that isn’t comfortable public speaking, or may only be barely comfortable in group meetings, can be fine. I'll never do anything to screw up team chemistry or bring in a prima donna but, with an excellent developer, I'm more willing to look at greatness around systems building and be OK with some other skills simply not being there as long as their absence doesn't screw-up the team chemistry overall.  With a PM, those skills need to be there and it just won't work without them.


It's mandatory that PMs not get "stuck in the weeds". They need to be able to look at the big picture and yet, at the same time, understand the details, even if they aren't necessarily writing the code that implements the details.  A PM is one of the folks on the team responsible for the product hanging together and having conceptual integrity.  They are one of the folks responsible for staying realistic and not letting the project scope grow and release dates slip. They are one of the team members that need to think customer first, to really know who the product is targeting, to keep the project focused on that target, and to get the product shipped


So, in summary: what I look for in a PM is similar to what I look for in a developer ( but I'll tolerate their coding possibly being a bit rusty. I expect they will have development experience. I'm pretty strongly against hiring a PM straight out of university -- a PM needs experience in a direct engineering role first to gain the experience to be effective in the PM role. I'll expect PMs to put the customer first and understand how a project comes together, keep it focused on the right customer set, not let feature creep set in, and to have the skill, knowledge, and experience to know when a schedule is based upon reality and when it's more of a dream.  Essentially I have all the expectations of a PM that I have of a senior developer, except that I need them to have a broad view of how the project comes together as a whole, in addition to knowing many of the details. They must be more customer focused, have a deeper view of the overall project schedules and how the project will come together, be a good communicator, perhaps a less sharp coder, but have excellent design skills. Finally, they must be good at getting a team making decisions, moving on to the next problem, and feeling good about it.






Wednesday, May 07, 2008 4:40:50 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
 Monday, May 05, 2008

I forget what brought it up but sometime back Sriram Krishnan forwarded me this article on Mike Burrows and his work through Dec, Microsoft, and Google (The Genius: Mike Burrows' self-effacing journey through Silicon Valley).  I enjoyed the read.  Mike has done a lot over the years but perhaps his best known works of recent years are Alta Vista at DEC and Chubby at Google.


I first met Mike when he was at Microsoft Research.  He and Ted Wobber (also from Digital) came up to Redmond to visit.  Back then I led the SQL Server relational engine development team which included the full text search index support.   I was convinced then, and still am today, that relational database engines do a good job of managing structured data but a poor job of the other 90 to 95% of the data in the world that is less structured.  It just seems nuts to me that customers industry-wide are spending well over $10B a year on relational database management systems and yet only being able to effectively use these systems to manage a tiny fraction of their data.  As an increasing fraction of the structured data in the world is already stored in relational database managements systems, industry growth will come from helping customers manage their less structured data. 


To be fair, most RDMBS (including SQL Server) do support full text indexing but what I’m after is deep support for full text where the index is a standard access method rather than a separate indexing engine on the side and, more importantly, full statistics are tracked on the full text corpus allowing the optimizer to make high quality decisions on join orders and techniques that include full text indices.


If you haven’t read Mike’s original Chubby paper, do that:  Another paper is at: Chubby is an interesting combination of name server, lease manager, and mini-distributed file system.  It’s not the combination of functionality that I would have thought to bring together in a single system but it’s heavily used and well regarded at Google.  Unquestionably a success.




James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:



Monday, May 05, 2008 4:32:43 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
 Thursday, May 01, 2008

The years of Moore’s law growth without regard to power consumption are now over. On the data center side, power isn’t close to the largest cost of running a large service but it is one of the largest controllable costs and it has been in the press frequently of late.  On the client side, battery power is the limiting factor. 


It is worth understanding what devices consume the most power since most laptops provide some form of user control.   Most systems allow LCD backlight dimming, the CPU power consumption can be lowered (a combination of factors including reducing clock speed and voltage), wireless radios can be switched off, and disks activity can be curtailed or eliminated.  Where does the power go? 


The data below was measured by Mahesri and Vardhan with an Thinkpad R40 as the system under test:













LCD Backlight



Wireless (802.11)




HDD (40GB@4,200RPM)








 Data from:


The dominant consumer by a significant factor is the CPU.   This power consumption is, of course, very load dependent particularly in multi-core systems where the spread between minimum and maximum power dissipation is even higher. The second largest consumer is the LCD backlight, which isn’t surprising.  Two LCD-related findings that I did find surprising: 1) the LCD without backlight is a very light consumer of power, and 2) there is a perceptible difference in power consumption between mostly black and mostly white backgrounds.   The hard disk drive power consumption was notably less than I expected with only 2.8W dissipated during active reading.


I wrote up more detail in: ClientSidePower6_External.doc (130 KB).




James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Thursday, May 01, 2008 4:49:55 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
 Tuesday, April 29, 2008

My rough notes from the Web 2.0 Keynote by Yahoo! CTO Ari Balogh:


·         Yahoo! is making three big bets:

1.       Be the starting point for all consumers

2.       Be the must buy for advertisers

3.       Provide an Open Platform

·         Focus of today’s talk is on the later, open platform.

·         Yahoo! broad set of assets are well known

·         We lead in 7 areas including: Mail, My Front Page and Messenger (the full list was provided nor how Yahoo! was computed to “lead” in these area)

·         350M unique users/month and 500M users overall

·         20B page views/month

·         250M users minutes per month

·         10B user relationships across properties and this is the real asset

·         Yahoo! has been open since 2003

·         25+ APIs

·         200K App IDs (hints at the large number of developers)

·         #2 API in the world with Flikr

·         1B UI files/served/week

·         Y!OS: (Yahoo! Open Strategy)

·         Announcing today they are open all assets at Yahoo! to developers

·         Planning to make all experiences at Yahoo “social”

·         Provide an open developer platform with hooks for third parties to make experiences more social

·         Built into application platform:

·         Security: give users control of their data.  Where they want to share what with who.

·         Application gallery. A common way to <JRH>

·         Unify profiles across all of Yahoo (this will take a while) and provide access to developers the social graph and the notification engine. Open up developer access to produce events and the platform includes the ranking engine to show users the most relevant events based upon their context (including social graph).

·         Making Yahoo! more social:

·         Not creating another social network

·         Making all of yahoo “social”

·         “social” isn’t a destination but rather a dimension of a user experience

·         “social” drives relevance, community, and virality

·         Showed some examples:

·         Email client showing messages most relevant on the basis of social network

·         Same basic idea for a “My Yahoo!” page

·         When?

·         Search Monkey is the first step

·         Later this year they will deliver Y!OS and provide more uniform and consistent developer access

·         Making Yahoo! more social will take longer with property by property steps being taken over next few years

·         Summary:

1.       Rewiring Yahoo! from the ground up

2.       Open Yahoo! to developers like never before

3.       Making Yahoo! more social across Yahoo! properties and to third party developers


The 12 min presentation is at: Ari Balogh Web 2.0 Expo Keynote.


James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:



Tuesday, April 29, 2008 3:53:37 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
 Friday, April 25, 2008

Flash SSDs in laptops have generated considerable excitement over the last year and are in use at both extremes of the  laptop market.  At the very low end, where only very small storage amounts can be funded, NAND Flash is below the below the disk price floor.  Mechanical disks with all their complexity are very difficult to manufacture for less than $30 each.  What this means is that for very small storage quantities, NAND Flash storage can actually be cheaper than mechanical disk drives even though the price per GB for Flash is larger. That’s why the One Laptop Per Child project uses NAND flash for persistent storage.  At the high end of the market, NAND flash considerably more expensive than disk but, for the premium price, offers much higher performance, more resilience to shock and high G handling, and longer battery life.


Recently there have been many reports of high-end SSD laptop performance problems.  Digging deeper, this is driven by two factors: 1) gen 1 SSDS produce very good read performance but aren’t particularly good on random write workloads, and 2) performance degradation over time.  The first factor can be seen clearly in this performance study using SQLIO:  The poor random write performance issue is very solvable using better Flash wear leveling algorithms, reserving more space (more on this later), and capacitor backed DRAM staging areas. In fact STEC ZeusIOPS is producing great performance numbers today, Fusion IO is reporting great numbers, and many others are coming.  The first problem, that of poor random write performance, can be solved and these solutions will migrate down to the commodity drives. 


The second problem, the performance degradation issue, is more interesting.  There have been many reports of laptop dissatisfaction and very high return rates: Returns, technical problems high with flash-based notebooks. Dell has refuted these claims Dell: Flash notebooks are working fine but there are lingering anecdotal complaints of degrading performance. I’ve heard it enough myself that I decided to dig deeper.  I chatted off the record with an industry insider on why SSDs appear to degrade over time.  Here’s what I learned (released with their permission):


On a pristine NAND SSD made of quality silicon to ensure write amplification remaining at 1 [jrh: write amplification refers to the additional writes that are caused by a single write due to wear leveling and the Flash erase block sizes being considerably larger than the write page size – the goal is to get this as close to 1 as possible where 1 is no write amplification], given a not-so-primitive controller and reasonable over-provisioning (greater than 25%), a sparsely used volume (less than half full at any time) will not start showing perceptible degraded performance for a long time (perhaps as long as 5 years, the projected warranty period to be given to these SSD products).


If any of the above conditions is changed, the write amplification will quickly degrade ranging from 2 to 5, or even higher.  That contributes to the early start of perceptible degraded write performance.  That is, on a fairly full SSD you’d start having perceptible write performance problems more quickly, and so on.


Inexpensive (cheap?) SSD made of low-quality silicon will likely to have more read errors.  Error correction techniques will still guarantee correct information being returned on reads.  However, each time a read error is detected, the whole “block” of data will have to be relocated elsewhere on the device.  A not-so-well designed controller firmware will worsen the read delay, due to poorly implemented algorithms and ill-conceived space layout that take longer to search for available space for the relocated data, away from the read error area.


If the read-error-data-relocation happens to collide with the negative conditions that plague the write performance above, you’d start seeing overall degraded performance very quickly.


Chkdsk may have contributed to the forced relocation of the data away from where read errors occurred, hence improving the SSD performance (for a while) until the above collisions happen.  Perhaps the same when Defrag is used.


In short, performance degradation over time is unavoidable with SSD devices.  It’s a matter of how soon it kicks in and how bad it gets; and it varies across designs.


We expect the enterprise class SSD devices to be as much as 100% over-provisioned (e.g., a 64GB SSD actually holds 128GB of flash silicon). 


Summary: there are two factors in play. The first is that SSD write random performance is not great on low end parts so ensure you understand the random write I/O specification before spending on an SSD. The second one is more insidious in that, in this failure mode, the performance just degrade slowly over time.  The best way to avoid this phenomena is to 2x over-provision.  If you buy N bytes of SSD, don’t use more than ½N and consider either chkdsk or copying the data off, bulk erasing, and sequentially copying back on . We know over-provisioning is effective. The later techniques are unproven but seem likely to work. I’ll report supporting performance studies or vendor reports when either surface.




James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:



Friday, April 25, 2008 4:15:29 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
 Wednesday, April 23, 2008

It’s not often I come across three interesting notes in the same day but here’s another. Earlier today the Jim Gray Systems Lab was announced and it will be lead by long time database pioneer David DeWitt.  This is great to see for a large variety of reasons. First of all it’s wonderful to see the contribution of Jim Gray to the entire industry recognized in the naming of this new lab.  Very appropriate.  Second I’m really looking forward to working more closely with DeWitt.  This is going to be fun.


This is “earned” in that Madison has been contributing great database developers to the industry for what seems like forever – I’ve probably worked with more Madison graduates over the years than any other single school. It’s good to see a systems focused research lab opened up there. 


It’s also good to see this project come together. I was involved in earlier discussions on this project some years back and, although we didn’t find a way to make it happen then, I really liked the idea.  I’m glad others were successful in doing the hard work to get this project to reality.


·         University of Wisconsin at Madison News:  

·         DeWitt Interview (from above):

·         Server and Tools Business News Blog:

·         Information Week:;jsessionid=2PMY2VDAXNZHSQSNDLOSKHSCJUNN2JVN?articleID=207401497




James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:



Wednesday, April 23, 2008 11:07:22 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback

Earlier today, Amazon AWS announced a reduction in egress charges.  The new charges:

·         $0.100 per GB - data transfer in

·         $0.170 per GB - first 10 TB / month data transfer out

·         $0.130 per GB - next 40 TB / month data transfer out

·         $0.110 per GB - next 100 TB / month data transfer out

·         $0.100 per GB - data transfer out / month over 150 TB


Compared with the old:

·         $0.100 per GB - data transfer in

·         $0.180 per GB - first 10 TB / month data transfer out

·         $0.160 per GB - next 40 TB / month data transfer out

·         $0.130 per GB - data transfer out / month over 50 TB


Most networking contracts charge symmetrically for ingress and egress – you pay the max of the two -- so the ingress cost to Amazon is effectively zero.


Note that it’s a non-linear reduction favoring higher volume users.  TechCrunch reported a couple of days back that the Amazon AWS customer base has rapidly swung from a nearly pure start-up community to more of a mix of startups and very large enterprises with the enterprise customers now bringing the largest workloads (  Not really all that surprising – I expected this to happen and talked about it in the Next Big Thing. What is surprising to me is the speed with which the transformation is taking place. I was predicting workload mix shift to happen at AWS 3 to 5 years from now. Things are moving quickly in the services world.




James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:



Wednesday, April 23, 2008 7:51:41 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback

Live Mesh has been under development for a couple of years now.  Now it’s hear in “technology preview” form. I think the first public mention was probably back in March of last year in a blog entry by Mary Jo Foley that mentioned Windows Live Core ( Last night Amit Mital, General Manager of Windows Live Core, did a blog entry that coves Live Mesh in more detail that previously seen:


UPDATE: The report above attributing first mention of Windows Live Core to Mary Jo Foley was incorrect.  The sleuths at LiveSide appear to have reported this one first:


Live Mesh is a platform that supports synchronizing data across devices, a platform for deploying  and managing apps that run on multiple devices, supports screen remoting making all your devices and applications available from anywhere, and it strikes an interesting balance exploiting both cloud services supported features and unique device capabilities. The initial device support is Windows only but Mac and other device clients are coming as well.


Screen shots are up on CrunchBase:


Ray Ozzie did a 36 min Channel 9 interview with Jon Udell:


Abolade Gbadegesin, Live Mesh Architect, did a video on Live Mesh Architecture that is worth checking out:


Demo video:




James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:



Wednesday, April 23, 2008 7:15:15 AM (Pacific Standard Time, UTC-08:00)  #    Comments [3] - Trackback
 Tuesday, April 22, 2008

Here’s a statistic I love, Facebook is running 1,800 MySQL Servers with only 2 DBAs. Impressive. I love seeing services show how far you can go towards admin-free operation. 2:1,800 is respectable and for database servers it downright impressive. This data from a short but interesting report at:


The Facebook fleet has grown fairly dramatically of late.   For example, Facebook is the largest Memcached installation and the most recent reports I had come across have 200 Memcached servers at facebook.  At the Scaling MySQL panel, they report 805 Memcached servers.


1,800 MySQL Servers, insulated by 805 Memcached servers, and driven by 10,000 web servers. Smells like success.




Thanks to Dare Obasanjo for pointing me to this one.


James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:



Tuesday, April 22, 2008 7:36:00 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
 Monday, April 21, 2008

Back in March I speculated that Google was soon to announce a third party service platform. Well, on the evening of April 7th, Google Application Engine was announced.  It’s been heavily covered over the last couple of weeks and I’ve been waiting to get a beta account so I can write some code against it. I’ve not yet got an account but Sriram Krishnan has been playing with it and sent me the following excellent review.


·         Guest book development video: Developing and deploying an application on Google App Engine (9:29)

·         Techcrunch: Google Jumps Head First Into Web Services With Google App Engine.

·         Google App Engine Limitations: evan_tech.

·         What’s coming up: We're up and Running!

·         High Scalability: Google App Engine – A second Look


Sriram’s review of Application Engine.



-       It’s well designed from end to end, builds on a good ecosystem of tools, most scenarios for a typical web 2.0 app are covered. If I were to ever get into the Facebook-app writing business, AppEngine would be my first choice. However, any startup which requires code to execute outside the web request-reply cycle is out of luck and would need to use EC2.

-       The mailing list is overflowing so there is obviously huge community interest and lots of real coders building stuff.

-       The datastore is a bit wonky for my taste. It neither fits into SQL/RDBMS nor the clean spreadsheet model of Amazon SimpleDb – it’s a ORM with some querying thrown-in and that leads to some abstraction leakages . The limitations on queries are going to take a bit of getting used to since they’re not intuitive at all(they only support queries where they can scan the index sequentially for results, the choice of datatype is not straightforward). The datastore was the area where I found myself consulting the docs most frequently.

-       Python-only is probably a big con at the moment. I’m a big Python fan but its pretty apparent that a lot of people want PHP and Ruby.  However, when you poke around the framework, it is pretty apparent that the framework is built to be language agnostic and that the creators had support for other languages in mind from the beginning.

-       Lack of SSL support, unique IPs per app instance are other problems. The latter really kicks in when you’re calling other Web 2.0 APIs. A lot of them do quota calculations based on IP address and this wont work when you’re sharing your IP with a bunch of other apps. Lack of SSL support is not a blocker (since you can use Google’s inbuilt authentication system) but will block any non-serious app.

-       The beta limits are too conservative and they are too aggressive in enforcing them -  they kept nuking my benchmarking apps for relatively short bursts of activity (more on that later). This really makes me hesitate to put anything non-trivial on AppEngine. If I were them, I would loosen up these limits or get customers to pay a bit extra for more CPU/network slices


The Web Framework

-       I’m familiar with Python and Django so I’m probably not the best person to judge the learning curve. It’s very clean and usable (I like it much better than ASP.NET) and I found myself being reasonable productive within a few minutes.

-        There are also put hooks in so that you can use almost any Python framework of your choice with a bit of work – you’re not stuck to the one provided. On the mailing list, there’s a lot of activity around porting other frameworks (pylons,, cherrypy, etc) to AppEngine. If it were up to me, I would be using Aaron Swartz’s but that is more a stylistic personal preference.

-       Python was not originally designed to be sandboxed so Google had to make some major cuts to make it ‘safe’ – they don’t allow opening sockets for example. This has caused a lot of open source Python code to stop working – essential libraries like urllib (the equivalent of .net’s HttpWebRequest) need some porting work.

-       The tools support is a bit sparse – debugging is mostly through printf/exception stack traces However, what it lacks in tooling is made up for in the speed of its edit cycle – just edit a .py file and then refresh the page.

-       Some people are going to have trouble getting used to the lack of sessions but I think the pain will be temporary (some people have started working on using the datastore as a Django session store to session state). From my limited testing, I didn’t see much machine affinity – Google seems happy to spin up processes on different machines and kill them the moment they finish serving the request.


The Datastore

-       You specify your data models in Python and there’s some ORM magic that takes place behind the scenes. They have a few inbuilt data types and you can use expando (dynamic) properties to assign properties at runtime which haven’t been defined in your model. Data schema versioning is a big question-mark at the moment – if I were Google, I would look into supporting something like RoR’s migrations

-       Querying is done through a SQL-subset called GQL on specifically defined indexes. For a query to succeed, the query must be supported by an index and the scan needs to find sequential results and this puts some restrictions on the kinds of queries you can execute (you can’t have inequality operators on more than one attribute, for example). Several indexes are auto-generated and you can request others to be created.

-       They appear to auto-generate several indexes.

-       Entities can be grouped together through ReferenceProperties into groups. Each group is stored together. Queries within one group can be bunched together into a transaction (everything is optimistic concurrency by default). Bunching together lots of entities into one group is bad since Google seems to do some sort of locking on the entity group – the docs say some updates might fail.

-       No join support. Like SimpleDb, they suggest de-normalization.

-       The datastore tools are sparse at the moment. I had to write code to delete stale data from my datastore since the website would only show me 20 items at a time.

-       All the APIs (the datastore, user auth, mail) are offered through Google’s internal RPC mechanism. Google calls the individual  RPC messages protocol buffers and all the AppEngine APIs are implemented using the afore-mentioned stub generators (this is what you get with the local SDK as well). 



This section is woefully short - it is very hard to run benchmarks since Google will keep killing apps with high activity. Here’s what I got


-       Gets/puts/deletes are all really fast. I benchmarked a tight loop running a fixed number of iterations, each query operating on a single object or retrieving a single object (which I kept tuning to avoid hitting the Google limits). Each averaged 0.001 s(next to nothing – almost noise).

-       Turning up the number of results to retrieve meant a linear increase in numbers. I inserted multiple entities with just a single byte in each to have the least possible serialization/de-serialization overhead.  For 50 results, the query execution time was around 0.15s, for 100, around 0.30s and so on. I saw a linear increase all the way until I hit Google’s limits on CPU usage.

-       I can’t measure this correctly but a ballpark guesstimate is that Google nukes your app if you use up close to 100% CPU (by running in a tight loop like I did) for over 2 seconds for any given request. For every app, they tell you the number of CPU cycles used (a typical benchmark app cost me around 50 megacycles) and I think they do some quota calculations based on megacycles used per second.


Overall, perf seems excellent but I would worry about hitting quota limits due to a Digg/Slashdot effect. I plan on trying out some more complex queries and I’ll let you know if I see something weird.


The Tools

-       The dashboard is excellent. Gives you nice views on error logs, what’s in the datastore, usage patterns for all your important counters (requests, CPU, bandwidth, etc)

-       Good end-to-end flow for the common tasks – registering a domain and assigning it to your application, managing multiple versions of your app, looking at logs,etc.


James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:


Monday, April 21, 2008 4:59:22 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
 Friday, April 18, 2008

In the Rules of Thumb post, I argued that many of the standard engineering rules the thumb are changing. On a closely related point, Nishant Dani and Vlad Sadovsky both pointed me towards The Landscape of Parallel Computing Research: A View from Berkeley by David Patterson et al. Dave Patterson is best known for foundational work on RISC and for co-inventing RAID.  He has an amazing ability to spot a problem where the solution is near, the problem is worth solving, and then come up with practical solutions.  This paper has many co-authors but shows some of that same style.  It focuses on parallel systems and some of the conventional wisdom that has driven systems designs for some time that are no longer correct.  The Berkeley web site with more detail is at:


In the paper they argue that 13 computational kernels can be used to characterize most workloads.  Then they go on to observe that over ½ of these kernels are memory bound today and we expect more to be in the future.  In effect, the problem is getting data up the storage and memory hierarchy to the processors not the speed of the processors themselves. This has been true for years and the problems worsens each year and yet it still seems that the problem gets less focus than scaling processors speeds even though the later won’t help without the first.


If you are interested in parallel systems, it’s worth reading the paper.  I’ve included the key changes in conventional wisdom below:


1. Old CW: Power is free, but transistors are expensive.

· New CW is the “Power wall”: Power is expensive, but transistors are “free”. That

is, we can put more transistors on a chip than we have the power to turn on.

2. Old CW: If you worry about power, the only concern is dynamic power.

· New CW: For desktops and servers, static power due to leakage can be 40% of

total power. (See Section 4.1.)

3. Old CW: Monolithic uniprocessors in silicon are reliable internally, with errors

occurring only at the pins.

· New CW: As chips drop below 65 nm feature sizes, they will have high soft and

hard error rates. [Borkar 2005] [Mukherjee et al 2005]

4. Old CW: By building upon prior successes, we can continue to raise the level of

abstraction and hence the size of hardware designs.

· New CW: Wire delay, noise, cross coupling (capacitive and inductive),

manufacturing variability, reliability (see above), clock jitter, design validation,

and so on conspire to stretch the development time and cost of large designs at 65

nm or smaller feature sizes. (See Section 4.1.)

5. Old CW: Researchers demonstrate new architecture ideas by building chips.

· New CW: The cost of masks at 65 nm feature size, the cost of Electronic

Computer Aided Design software to design such chips, and the cost of design for

GHz clock rates means researchers can no longer build believable prototypes.

Thus, an alternative approach to evaluating architectures must be developed. (See

Section 7.3.)

6. Old CW: Performance improvements yield both lower latency and higher


· New CW: Across many technologies, bandwidth improves by at least the square

of the improvement in latency. [Patterson 2004]

7. Old CW: Multiply is slow, but load and store is fast.

· New CW is the “Memory wall” [Wulf and McKee 1995]: Load and store is slow,

but multiply is fast. Modern microprocessors can take 200 clocks to access

Dynamic Random Access Memory (DRAM), but even floating-point multiplies

may take only four clock cycles.

The Landscape of Parallel Computing Research: A View From Berkeley


8. Old CW: We can reveal more instruction-level parallelism (ILP) via compilers

and architecture innovation. Examples from the past include branch prediction,

out-of-order execution, speculation, and Very Long Instruction Word systems.

· New CW is the “ILP wall”: There are diminishing returns on finding more ILP.

[Hennessy and Patterson 2007]

9. Old CW: Uniprocessor performance doubles every 18 months.

· New CW is Power Wall + Memory Wall + ILP Wall = Brick Wall. Figure 2 plots

processor performance for almost 30 years. In 2006, performance is a factor of

three below the traditional doubling every 18 months that we enjoyed between

1986 and 2002. The doubling of uniprocessor performance may now take 5 years.

10. Old CW: Don’t bother parallelizing your application, as you can just wait a little

while and run it on a much faster sequential computer.

· New CW: It will be a very long wait for a faster sequential computer (see above).

11. Old CW: Increasing clock frequency is the primary method of improving

processor performance.

· New CW: Increasing parallelism is the primary method of improving processor

performance. (See Section 4.1.)

12. Old CW: Less than linear scaling for a multiprocessor application is failure.

· New CW: Given the switch to parallel computing, any speedup via parallelism is a



James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | |  | blog:



Friday, April 18, 2008 4:42:25 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback

Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.

<June 2008>

This Blog
Member Login
All Content © 2015, James Hamilton
Theme created by Christoph De Baene / Modified 2007.10.28 by James Hamilton