Sunday, June 08, 2008

I’m interested in high-scale web sites, their architecture and their scaling problems.  Last Thursday, Oren Hurvitz posted a great blog entry summarizing two presentations at Java One on the LinkedIn service architecture.

 

LinkedIn scale is respectable:

·         22M members

·         130M connections

·         2M email messages per day

·         250k invitations per day

 

No big surprises other than LinkedIn are still using a big, expensive commercial RDBMS and a large central Sun SPARC Server. LinkedIn call this central server “The Cloud” and its described as the “backend server caching the entire LinkedIn Network”.  Note that “server” is not plural – it appears that the cloud is not partitioned  but it is replicated 40 times. However, each instance of the cloud hosts the entire social graph in 12GB of memory. 

 

Social graphs are one of my favorite services problems in that they are notoriously difficult to effectively partition. I often refer to this as the “hairball problem”.  There is no clean data partition that will support the workload with adequate performance. Typical approaches to this problem redundantly store a several copies of the data with different lookup keys and potentially different partitions. For example, most social networks have some notion of a user group.  The membership list is stored with each group.  

 

But, a common request is to find all groups for which a specific user is a member. Storing users with the group allows efficient group enumeration but doesn’t support the “find all groups of a given user” query.  Storing the group membership with each user supports this query well but doesn’t allow efficient group membership enumeration. The most common solutions are to store the data redundantly both ways. Typically one is the primary copy and the other is updated asynchronously after the primary is updated. This effectively makes read much more efficient at the cost of more work at update time.  Sometimes, the secondary copy isn’t persisted in the backend database and is only stored in an in memory cache often implemented using memcached.

 

The LinkedIn approach of using a central in-memory, social graph avoids some of the redundancy of the partitioned model I described above at the expensive of requiring a single-server memory large enough to store the entire un-partitioned social graph.  Clearly this is more efficient by many measures but requiring that the entire social graph fit into a single servers memory means that more expensive servers are required as the service grows. And, many of us can’t sleep at night with hard scaling limits even if they are apparently fairly large.

 

Other scaling web sites pointers are posted at: http://perspectives.mvdirona.com/2007/11/12/ScalingWebSites.aspx.

 

Thanks to Dare Obasanjo and Kevin Merritt for sending my way.

 

                                                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | blog:http://perspectives.mvdirona.com

 

 

Sunday, June 08, 2008 11:14:02 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1] - Trackback
Services
 Friday, June 06, 2008

There was an interesting talk earlier today at Microsoft Research by Jason Cong of the UCLA Computer Science Department on compiling design specifications in C/C++/SystemC and user constraints into ASIC and FPGA design. The advantage of compiler based approaches include, more productivity working at a higher level, automating verification, allows optimization, and allows rapid experimentation at different frequencies and different optimization goals (performance vs power, for example). As design complexity increases higher level language and optimization have excellent potential.  Essentially the same thing is happening in hardware design as happened 30 years ago in operating systems implementation languages. High level implementation languages replace lower level as complexity climbs.  For example, the Intel Core 2 Duo processor is a 1B transistor implementation whereas the 386 was 1/1000th of that complexity at 1M.

 

Also super interesting was the example from a financial institution that is taking a software based stock analysis system where they take the hottest parts of the system and compile these to FPGA implementations. 30x faster at 1/10th the power. Very cool.  

 

Now that AMD supports hyper transport it is possible to implement custom processors with excellent overall system performance.  Intel has opened up the FSB and is also expected to offer a non-compatible hyper transport-like implementation in the future.

 

My rough notes from the talk follow:

 

·         Speaker: Jason Cong, UCLA Computer Science (cong@cs.ucla.edu)

·         Working on on-chip interconnects & communications

o   3D IC design

o   RF-interconnects

§  Note that power restrictions restrict processors to ~5GH

·         But, communications lines can scale to 100s of GH

·         Dividing communications link into 10 or more “channels” that operate at different frequencies

·         This talk focused on  ESL SystemC to FPGA compiler

·         Why?

o   700,000 lines of RTL for a 10M gate design is too much

o   Allows executable specification

o   Verification requires executable design

o   Accelerated computing or reconfigurable computing also need C/C++ based compilation/synthesis to FPGAs

§  CPUs coupled with FPGA to support common functions at high performance and lower power

·         Note that performance limited by communications (getting data to the CPU)

o   Long wires that have to be traversed in a single clock are the limiting factor

o   This research focuses on supporting multi-cycle communications

·         xPilot: Behavior-to-RTL (Register Transfer Level design) synthesis flow

o   takes behavior spec in C/SystemC to front-end compiler to SSDM

o   SSDM is optimized using standard compiler optimization (loop unrolling, strength reduction, scheduling, etc.)

o   SSDM is compiled to:

§  Verilog/VHDL/SystemC

§  FPGAs: Altera, Xilinx

§  ASICs: Magma, Synopsys

o   UPS: Uniform Power Specification

o   During final compilation optimize for power and shut off compute units that are not being used and shut off those that are being during idle periods (a busy disk controller is frequently waiting for mechanicals and not need to execute instructions)

§  Can’t shut FPGAs but can with ASICs

§  The only solution to dynamic power leakage only solution is shut the component of

o   Allows faster experimentation than hand coding.  You can try different frequencies and different power optimizations (too complex for most humans)

o   Scheduling (allocation of operations to compute logic and specific clock cycles) is NP-complete and automated techniques can exceed quality of expert designs

·         Example:

o   Schedule the behavior to RTL using the following characterization, cycle time, constraints, and objectives:

§  Platform characterization: adder (2ns) & multiplier (5ns)

§  Target cycle time (10ns)

§  Resource constraint: only one multiplier is available

§  Objective: high performance or lower power as examples

·         Note as optimizing and reducing component counts, less space is required, which can allow faster clocking

·         Investigating compilation for Reconfigurable Accelerated Computing

o   Take GCC 3DES implementation and synthesis FPGS RTL description

o   Example took 3DES from C level implementation to a FPGA (Xilinx Virtex-5)

·         Investment bank is using this tool to compile financial optimizations from S/W implementation to FPGA accelerators (Black-Scholes S/w kernel)

o   30x speed-up over software implementation and 1/10 the power (6 vs 68W)

 

A related presentation: http://cadlab.cs.ucla.edu/~cong/slides/fpt05_xpilot_final.pdf.

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | blog:http://perspectives.mvdirona.com

 

 

Friday, June 06, 2008 10:52:35 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Hardware
 Tuesday, June 03, 2008

 

Last week at Google IO, pricing was announced for Google Application Engine. Actually it was blogged the night before at: http://googleappengine.blogspot.com/2008/05/announcing-open-signups-expected.html.

 

The prices are close to identical with Amazon AWS although GAE differs substantially from the AWS offerings.  The former offers a easy to use Python  execution environment whereas Amazon offers the infinitely flexible run-this-virtual-machine model. Clearly the Amazon model costs more to provide so, by that measure, AWS pricing is somewhat better:

 

Google Application Engine Pricing:

·    $0.10 - $0.12 per CPU core-hour

·    $0.15 - $0.18 per GB-month of storage

·    $0.11 - $0.13 per GB outgoing bandwidth

·    $0.09 - $0.11 per GB incoming bandwidth

·    From: http://googleappengine.blogspot.com/2008/05/announcing-open-signups-expected.html

Compared with AWS Pricing:

·         $0.10 - $0.80 per VM hour (depending upon resources allocated)

·         $0.15 per GB-month of storage

·         $0.100 - $0.170 per GB outgoing bandwidth

·         $0.100 per GB incoming bandwidth

·         From: http://www.amazon.com/S3-AWS-home-page-Money/b/ref=sc_fe_l_2?ie=UTF8&node=16427261&no=3435361&me=A36L942TSJ2AJA and http://www.amazon.com/EC2-AWS-Service-Pricing/b/ref=sc_fe_l_2?ie=UTF8&node=201590011&no=3435361&me=A36L942TSJ2AJA.

There are some important differences that make the pricing comparison somewhat biased in a couple of ways. Two important differences: 1) as mentioned above, Amazon gives an entire virtual machine so EC2 is much more flexible than GAE both in that it can run arbitrary applications in arbitrary languages and that it supports all execution models whereas GAE only supports HTTP request/response.  Another key difference is the storage subsystem.  In the numbers above, we’re comparing the Amazon blob store (S3) with the more structured storage model offered by GAE.  The more comparable AWS SimpleDB pricing is considerably higher than the GAE storage pricing. SimpleDB charges $1.50 GB/month in addition to machine usage and network transmission costs.  GAE is offering much more affordable semi-structured storage and the GAE storage model actually supports data types rather than having to force everything to character format.

 

GAE is still free to start with under 5M page views/month and up to ½ GB storage for free.  Obviously this helps developers get started without strings and that’s a good thing. But, more importantly, it avoids Google from having to go to the expense of billing very small values.  In a weird sort of way, I’m more impressed with AWS billing $0.04 on some accounts in that it shows there billing system is incredibly lean. Scaling down billing is hard, hard, hard.

 

In addition to announcing prices, GAE went from a controlled admission beta to a fully open beta where all comers are welcome.  I’m impressed how quickly they have gone from making the service initially available to a full open beta.  Impressive.

 

Also announced last week was a new GAE Memcached API which appears to have been a 20% project of Brad Fitzpatrick (Sriram Krishnan sent my way). And a set of image manipulation APIs supporting image scaling, rotating, etc. will now be part of the GAE API set.

 

My notes from Google IO:

·         http://perspectives.mvdirona.com/2008/05/29/RoughNotesFromSelectedSessionsAtGoogleIODay1.aspx

·         http://perspectives.mvdirona.com/2008/05/29/IO2008RoughNotesFromMarissaMayerDay2KeynoteAtGoogleIO.aspx

·         http://perspectives.mvdirona.com/2008/05/30/IO2008RoughNotesFromSelectedSessionsAtGoogleIODay2.aspx

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | blog:http://perspectives.mvdirona.com

Tuesday, June 03, 2008 8:05:05 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Sunday, June 01, 2008

Yesterday the Tribute to Honor Jim Gray was held at the University of California at Berkeley. We all miss Jim deeply so it really is a tough topic.  But it was great to get together with literally 100s of Jim’s friends and share stories and talk about some of his accomplishments, his contributions to the field, and his contributions to each of us.  Jim is amazing across all three dimensions but what is most remarkable is the profound way he helped others achieve more throughout the industry.  We’re all better engineers, researchers, and human beings for having been lucky enough to have known and worked with Jim.

 

Also announced yesterday was the creation of the Jim Gray Chair at Cal Berkeley.  Bill Gates Eric Schmidt, Marc Benioff, and Mike Stonebraker each donated $250,000 which were matched by a $1,000,000 from the Hewlett Foundation.

 

Seattle PI coverage: Gathering in Berkeley, Calif., today to honor legendary scientist, Microsoft researcher Jim Gray.

 

The morning, general session agenda:

             Welcome - Shankar Sastry

             Opening Remarks - Joseph Hellerstein

             A Tribute, Not a Memorial: Understanding Ambiguous Loss - Pauline Boss

             The Amateur Search - Michael Olson

             Jim Gray at Berkeley - Michael Harrison

             Knowledge and Wisdom - Pat Helland

             Why Did Jim Gray Win the Turing Award? - Michael Stonebraker

             Jim Gray Chair - Stuart Russell

             500 Special Relationships: Jim as a Mentor to Faculty and Students - Ed Lazowska

             Jim Gray: His Contributions to Industry - David Vaskevitch

             A "Gap Bridger" - Richard Rashid

             Thanks to the U.S. Coast Guard - Paula Hawthorn

 

The afternoon, technical session agenda:

·         Welcome - Shankar Sastry

·         Opening Remarks - Joseph Hellerstein

·         A Tribute, Not a Memorial: Understanding Ambiguous Loss - Pauline Boss

·         The Amateur Search - Michael Olson

·         Jim Gray at Berkeley - Michael Harrison

·         Knowledge and Wisdom - Pat Helland

·         Why Did Jim Gray Win the Turing Award? - Michael Stonebraker

·         Jim Gray Chair - Stuart Russell

·         500 Special Relationships: Jim as a Mentor to Faculty and Students - Ed Lazowska

·         Jim Gray: His Contributions to Industry - David Vaskevitch 

·         A "Gap Bridger" - Richard Rashid

·         Thanks to the U.S. Coast Guard - Paula Hawthorn

 

The event was video recorded and streamed via http://webcast.berkeley.edu/.

 

Update: the video will be at:  Tribute to Honor Jim Gray - General Session (thanks to George Spix for sending my way).

Second Update: A good article by John Markoff of the NY Times: A Tribute to Jim Gray: Sometimes Nice Guys Do Finish First.

 

                                    --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | blog:http://perspectives.mvdirona.com

 

 

Sunday, June 01, 2008 10:08:01 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Ramblings
 Thursday, May 29, 2008

Google IO notes continued from earlier in the day: http://perspectives.mvdirona.com/2008/05/29/IO2008RoughNotesFromMarissaMayerDay2KeynoteAtGoogleIO.aspx and yesterday: http://perspectives.mvdirona.com/2008/05/29/RoughNotesFromSelectedSessionsAtGoogleIODay1.aspx

 

Google Web Toolkit and Client-Server Communications

·         Speaker: Miquel Mendez

·         GWT client/server communication options:

o   Frames

o   Form Panel

o   XHR: RequestBuilder (be careful don’t to start too many—many browsers have limits)

o   XML RPC

·         XML Encoding/Decoding: com.google.gwt.xml defines XML related classes

·         JSON Encoding/Decoding: com.google.gwt.json.JSON defines JSON related classes

·         GWT RPC: Generator that generates code and makes use of RequestBuilder

 

Reusing Google APIs with Google Web Toolkit

·         Speaker: Miquel Mendez

·         GALGWT: Google API Library for GWT.  It’s an open source project lead by Miquel (Javascript bindings to GWT).

o   It’s a collection of easy to use GWT bindings for existing Google JavaScript APIs

o   It’s a Google code open source project

·         Reminder: GWT is a java to Javascript compiler.

·         GWT now has a gadget class.  Google Gadget creation using GWT by extending the Gadget class and implementing the NeedsXXX intrerfaces.

·         Gears support:

o   Exposes database, LocalServer, and WorkerPool JS modules

o   Provides an offline module that automates the process fo going offline (creates the necessary manifests automatically)

·         Google Maps support as well

 

Engaging User Experiences with Google App Engine

·         Speakers: John Skidgel (designer) & & Lindsay Simon (developer).

·         Showed a guest book application written using Djanjo Form.  It’s been modified to run under App Engine (didn’t say how).

·         App engine development environment makes it easy to work with a designer as it’s easy to install and runs well on a Mac.

·         Walked through what they called 3D (Design, Develop, & Deploy) and how they handle it.

·         Authentication options:

·         Do your own

·         Use GAE (any authenticated user or you can  narrow the population to your domain only – all supported out of the box).

·         Don’t make auth a gating factor or you will lose users – auth at the last possible moment

·         Use the App Engine Datastore for sessions

·         Decreasing Latency:

·         Create build rules to concatenate and minify (Yahoo! Minifier) CSS and JS

·         File fingerprinting

·         Set expires headers for a very long time but add a version ID.  He showed how to handle the version number on the server side.  The recommendation was 10 year expiration with version numbers.

·         Recommends “Progressive Enhancement” or “Defensive Enhancement”.  You should still be able to render without JS. JS should give a better experience but you may not have JS (crappy mobile browsers for example).  Another test, shut off CSS and it should still work.

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | blog:http://perspectives.mvdirona.com

 

 

Thursday, May 29, 2008 5:55:26 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services

Continued from Yesterday (day 1): Rough notes from Selected Sessions at Google IO Day 1.

 

Marissa Mayer Keynote: A Glimpse Under the Hood at Google

·         Showed iGoogle and talked about how Google Gadgets are a great way to get broad distribution and are a form of advertising.

·         Search is number 2 most used application (after email)

·         The ordinary and the everyday

·         Why is search page so simple?

·         Variation of Occam’s Razor: “the simple design is probably right”

·         Sergey did it and it was because “there was nobody else to do it and he doesn’t do HTML”

·         Described process of answering a query (700 to 1,000 machines in .16 seconds):

·         This time of day we’re busy so the query will likely go to one data center and likely get bounced to another (must be a simplification of what really happens – load ballancing)

·         Mixer

·         Google Web Server

·         Ads + Websearch (300 to 400 systems)

·         Back to mixer

·         Back to Web server

·         Back to load balancer

·         Split A/B Testing:

·         We given a subset of users a different user experience. Web services allow very detailed views and to iterate very quickly and evolve rapidly.

·         Example: amount of white space under Google logo on results page?

<