James Hamilton's Blog RSS 2.0
 Thursday, December 06, 2007

Michael Hunter, who authors the Testing and Debugging blog at Dr. Dobb’s Journal, asked me for an interview on testing related topics some time back. I’ve long lamented that, industry-wide, there isn’t nearly enough emphasis on test and software quality assurance innovation. For large projects, test is often the least scalable part of the development process.  So, when Michael offered me a platform to discuss test more broadly, I jumped on it. 

 

Michael structures these interviews, and his subsequent blog entry, around five questions.  These ranged from where I first got involved in software testing, through the most interesting bug I’ve run into, what has most surprised me about testing, what’s the most important thing for a tester to know, and what’s the biggest challenge facing the test discipline over the next five years.

 

Michael’s interview is posted at: http://www.ddj.com/blog/debugblog/archives/2007/12/five_questions_39.html.

 

                                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Thursday, December 06, 2007 10:50:14 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Software
 Wednesday, December 05, 2007

Mike Zintel (Windoes Live Core) sent this one my way.  It’s a short 2:45 video that is not particularly informative but it is creative: http://www.youtube.com/watch?v=fi4fzvQ6I-o.

 

                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh 

Wednesday, December 05, 2007 10:46:39 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Ramblings
 Tuesday, December 04, 2007

Amazon doesn’t release much about its inner workings which is unfortunate in that hey do some things notably well and often don’t get credit.  My view is that making some of these techniques more public would be a great recruiting tool for Amazon but I understand the argument for secrecy as well.

 

Ronny Kohavi recently pointed me to this presentation on A/B testing at Amazon.  It’s three years old but still well worth reading.  Key points from my perspective:

·         Amazon is completely committed to A/B testing.  In past Bezos presentations he’s described Amazon as a “data driven company”. One of the key advantages of a service is you get to see in real time how well it’s working.  Any service that doesn’t take advantage of this and is just making standard  best guess or informed expert decisions, is missing a huge opportunity and hurting their business.  The combination of A/B testing and cycling through ideas quickly does two wonderful things: 1) it makes your service better FAST, and 2) it takes the politics and influence out of new ideas.  Ideas that work win and, those that don’t show results, don’t get used whether proposed by a VP or the most junior web designers.  It’s better for the service and for everyone on the team.

·         The infrastructure focus at Amazon. Bezos gets criticized by Wall Street analyst for over investing in infrastructure but the infrastructure investment gives them efficiency and gives them pricing power which is one of their biggest assets.  The infrastructure investment also allows them to host third parties which gives Amazon more scale in a business where scale REALLY matters and it gives customers broader selection which tends to attract more customers.  Most important, it gives Amazon, the data driven company, more data and this data allows them to improve their service rapidly and give customers a better experience: “customers who bought X…”

·         Negative cost of capital: Slide 8 documents how they get a product on day 0, sell it on day 20, get paid on day 23, and pay the supplier on day 44.

·         Slide 7 shows what can be done with a great infrastructure investment: respectable margins and very high inventory turn rates.

 

The presentation is posted: http://ai.stanford.edu/~ronnyk/emetricsAmazon.pdf

 

                                                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh 

 

Tuesday, December 04, 2007 7:23:10 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Friday, November 30, 2007

Some months back I finished a paper with Joe Hellerstein and Michael Stonebraker scheduled to be published in the next issue of Foundations and Trends of Databases.  This paper is aimed at describing how current generation database management systems are implemented.  I’ll post a reference to it here once it is published.

 

As very small part of this paper, we cover the process model used by Oracle, DB2, MySQL, SQL Server, and PostgreSQL.  A process model is how a database maps the work it’s doing on behalf of multiple concurrent users onto operating system processes and/or threads.  This is an important design choice in that it has fundamental impact on the number of concurrent requests that can be supported, development costs, maintainability, and code base portability amongst other issues. 

 

These same design choices are faced by most high scale server designers and is equally applicable to mail servers, web servers, app servers, and any other application needing to service large numbers of requests in parallel. Given the importance of the topic and it’s applicability to all multi-user server systems, it’s worth covering separately here.  I find it interesting to note that three of the leading DBMSs support more than one process model and one supports four variants. There clearly is no single right answer.

 

Summarizing the process models supported by IBM DB2, MySQL, Oracle, PostgreSQL, and Microsoft SQL Server:

 

1.      Process per DBMS Worker: This is the most straight-forward process model and is still heavily used today.  DB2 defaults to process per DBMS worker on operating systems that don’t support high quality, scalable OS threads and thread per DBMS worker on those that do.  This is also the default Oracle process model but they also supports process pool as described below as an optional model.  PostgreSQL runs the Process per DBMS Worker model exclusively on all operating system ports.

2.      Thread per DBMS Worker: This an efficient model with two major variants in use today:

a.       OS thread per DBMS Worker: IBM DB2 defaults to this model when running on systems with good OS thread support. This is the model used by MySQL as well.

b.      DBMS Thread per DBMS Worker: In this model DBMS Workers are scheduled by a lightweight thread scheduler on either OS processes or OS threads both of which are explained below. This model avoids any potential OS scheduler scaling or performance problems at the expense of high implementation costs, poor development tools and debugger support, and substantial long-standing maintenance costs.  There are two sub-categories of this model:

                                                              i.      DBMS threads scheduled on OS Process: a lightweight thread scheduler is hosted by one or more OS Processes.  Sybase uses this model and began with the thread scheduler hosted by a single OS process.  One of the challenges with this approach is that, to fully exploit shared memory multi-processors, it is necessary to have at least one process per processor.  Sybase has since moved to hosting DBMS threads over potentially multiple OS processes to avoid this limitation.  When DBMS threads within multiple processes, there will be times when one process has the bulk of the work and other processes (and therefore processors) are idle.  To make this model work well under these circumstances, DBMSs must implement thread migration between processes. Informix did an excellent job of this starting with the Version 6.0 release.  All current generation systems supporting this model implement a DBMS thread scheduler that schedules DBMS Workers over multiple OS processes to exploit multiple processors.

                                                            ii.      DBMS threads scheduled on OS Threads:  Microsoft SQL Server supports this model as a non-default option.  By default, SQL Server runs in the DBMS Workers multiplexed over a thread pool model (described below).  This SQL Server option, called Fibers, is used in some high scale transaction processing benchmarks but, otherwise, is in very light use.

3.      Process/Thread Pool: In this model DBMS workers are multiplexed over a pool of processes.  As OS thread support has improved, a second variant of this model has emerged based upon a thread pool rather than a process pool.  In this later model, DBMS workers are multiplexed over a pool of OS threads:

a.       DBMS workers multiplexed over a process pool: This model is much more efficient than process per DBMS worker, is easy to port to operating systems without good OS thread support, and scales very well to large numbers of users.  This is the optional model supported by Oracle and the one they recommend for systems with large numbers of concurrently-connected users.  The Oracle default model is process per DBMS worker.  Both of the options supported by Oracle are easy to support on the vast number of different operating systems they target (at one point Oracle supported over 80 target operating systems).

b.      DBMS workers multiplexed over a thread pool: Microsoft SQL Server defaults to this model and well over 99% of the SQL Server installations run this way. To efficiently support 10’s of thousands of concurrently connected users, SQL Server optionally supports DBMS threads scheduled on OS threads.

 

Most current generation commercial DBMSs support intra-query parallelism, the ability to execute all or parts of query in parallel. Essentially, intra-query parallelism is the temporary assignment of multiple DBMS workers to execute a SQL query.  The underlying process model is not impacted by this feature other a single client connection may, at times, have more than a single DBMS worker.

 

Process model selection has a substantial influence on DBMS scaling and portability. As a consequence, three of the most successful commercial systems each support more than one process model across their product line.   From an engineering perspective, it would clearly be much simpler to employ a single process model across all operating systems and at all scaling levels.  But, due to the vast diversity of usage patterns and the non-uniformity of the target operating systems, however, each DBMS has elected to support multiple models.

 

                                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | Msft internal blog: msblogs/JamesRH

 

Friday, November 30, 2007 5:48:30 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Software
 Monday, November 26, 2007

There are few things we do more important than interviewing and yet it’s done very unevenly across the company.  Some are amazing interviewers and others, well, I guess they write good code or something else J.

 

Fortunately, interviewing can be learned and, whatever you do and wherever you do it, interviewing with insight pays off.  Some time back, a substantial team was merged with the development group I lead and, as part of the merger,  a bunch of senior folks many of whom do As Appropriate (AA) interviews and all of which were frequently contributors on our interview loops.  The best way to get in sync on interviewing techniques and leveling is to talk about it so I brought us together several times to talk about interviewing, to learn from each other, and set some standards on how we’re going to run our loops.  In preparation for that meeting, I wrote up some notes of what I view as best practices for AAs but these apply to all interviewers and I typically send these out whenever I join a team. 

 

Some of these are specific to Microsoft but many apply much more broadly.  There is some internal Microsoft jargon used in the doc.  For example, at Microsoft the AA is short for “As Appropriate” and is the final decision making on whether an offer will be made. However, most of what is here is company invariant.

 

The doc: JamesRH_AA_Interview_NotesX.doc (37 KB).

 

I also pulled some of the key points into a ppt: AA Interview NotesX.ppt (156.5 KB).

 

                                    --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | Msft internal blog: msblogs/JamesRH

 

Monday, November 26, 2007 5:44:24 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Process
 Friday, November 23, 2007

Ted Wobber (msft Research) brought together the following short list of SSD performance data.  Note the FusionIO part claiming 87,500 IOPS in a 640 GB package.  I need to run a perf test against that part and see if it's real.  It looks perfect for very hot OLTP workloads.

 

A directory of “fastest SSDs”:

http://www.storagesearch.com/ssd-fastest.html

                Note that this contains RAM SSDs as well as flash SSDs.  This list, however, seems to be ranked by bandwidth, not IOPs.

 

This manufacturer make a very high-end database accelerator:

http://www.stec-inc.com/technology/

Among the things that they do are:   most likely logical address re-mapping, way over-provisioning of free space, highly parallel ops

 

Then there are these guys who do the hard work in the host OS:

http://managedflash.com/home/index.htm

They clearly do logical address re-mapping, but their material is strangely devoid of mention of cleaning costs.  Perhaps they get “free” hints from the OS free block table.

 

Nevertheless, the following article is worth reading:

http://mtron.easyco.com/news/papers/easyco-flashperformance-art.pdf

 

FusionIO: http://fusionio.com/.

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | Msft internal blog: msblogs/JamesRH

 

Friday, November 23, 2007 6:21:26 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Hardware
 Tuesday, November 20, 2007

Google has been hiring networking hardware folks so it’s been long speculated that they are building their own network switches.  This remains speculation only but the evidence is mounting:

 

From http://www.nyquistcapital.com/2007/11/16/googles-secret-10gbe-switch/ (Sent my way by James Depoy of the OEM team and Michael Nelson of SQL Server):

Through conversations with multiple carrier, equipment, and component industry sources we have confirmed that Google has designed, built, and deployed homebrewed 10GbE switches for providing server interconnect within their data centers.

We believe Google based their current switch design on Broadcom’s (BRCM) 20-port 10GE switch silicon (BCM56800) and SFP+ based interconnect. It is likely that Broadcom’s 10GbE PHY is also being employed. This would be a repeat of the same winner-take-all scenario that played out in 1GbE interconnect.

 

This article attempts to track Google consumption by tracking shipments of 20 port 10GigE silicone. Not a bad approach.  In their determination, Google is installing 5,000 ports a month. If you assume that servers dominate over inter-switch connections, the implication is that Google is installing nearly 5k servers per months.

 

                                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | Msft internal blog: msblogs/JamesRH

Tuesday, November 20, 2007 8:38:39 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Monday, November 19, 2007

Last week I attended and presented at USENIX LISA (http://www.usenix.org/event/lisa07/) conference. I presented Designing and Deploying Internet-Scale Applications and the slides are at: PowerPoint slides.

 

I particularly enjoyed Andrew Hume’s (AT&T) talk where he talked about the storage sub-systems used at AT&T research and the data error rates he’s been seeing over the last several decades and what he does about it.  His experience exactly parallels mine with more solid evidence and can be summarized by all layers in the storage hierarchy produce errors. The only way to store data for the long haul is with redundancy coupled with end-to-end error detection.  I enjoyed the presentations of Shane Knapp and Avleen Vig of Google in that they provided a small window into how Google takes care of their ~10^6 servers with a team of 30 or 40 hardware engineers world-wide, the software tools they use to manage the world’s biggest fleet and the releases processes used to manage these tools. Guido Trotter also of Google talked about how Google IT (not the production systems) were using Xen and DRDB to build a highly reliable IT systems.  He used DRDB (http://www.drbd.org/download.html) to do asynchronous, block level replication between a primary and a secondary.  The workloads runs in a Xen virtual machine and, on failure, is restarted on the secondary. Ken Brill, Executive Director of the Uptime Institute  made a presentation focused on power being the problem.  Ignore floor space cost, system density is not the issue, it’s a power problem. He’s right and it’s becoming increasingly clear each year.

 

 My rough notes from the sessions I attended are at:  JamesRH_Notes_USENIXLISA2007x.docx (21.03 KB).

 

                                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | Msft internal blog: msblogs/JamesRH

 

Monday, November 19, 2007 7:06:36 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Friday, November 16, 2007

Three weeks ago I presented at HPTS (http://www.hpts.ws/index.html). HPTS is an invitational conference held every two years since 1985 in Asilomar California that brings together researchers, implementers, and users of high scale transaction processing systems.  It’s one of my favorite conferences in that it attracts a very interesting group of people, is small enough that everyone can contribute and there is lots of informal discussion in a great environment on the ocean near Monterey.

 

I presented Modular Data Center Design and Designing and Deploying Internet-Scale Services.  A highlight of this year’s session was a joint keynote address from David Patterson of Berkeley and Burton Smith of Microsoft.  Dave's slides are posted at DavidPattersonTechTrends2007.ppt (442.5 KB).  Burton's not in the office right now so I don't have access to his but will post them when I do.

 

I’m the General Chair for the 2009 HPTS which is scheduled to be October 25 through 28, 2009.  Keep the date clear and plan on submitting an interesting position paper to get invited.  If you are doing high scale data centric applications, HPTS is always fun.

 

                                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | Msft internal blog: msblogs/JamesRH

 

Friday, November 16, 2007 5:19:45 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Software
 Monday, November 12, 2007

For the last year or so I’ve been collecting Scaling Web Site war stories and I’ve been posting them to my Microsoft internal blog.  I collect them for two reasons: 1) scaling web site problems all center around persistent state management and I’m a database guy so the interest is natural, and 2) it’s amazing how frequently the same trend appears: design a central DB.  Move to functional partition. Move to a horizontal partition. Somewhere through that cycle, add caching at various levels.  Most skip the step hardware evolution of starting with scale-up servers and then moving to scale out clusters but even that pattern shows up remarkably frequently (e.g. eBay, and Amazon).

 

Scaling web site war stories:

·         Scaling Amazon: http://glinden.blogspot.com/2006/02/early-amazon-splitting-website.html

·         Scaling Second Life: http://radar.oreilly.com/archives/2006/04/web_20_and_databases_part_1_se.html

·         Scaling Technorati: http://www.royans.net/arch/2007/10/25/scaling-technorati-100-million-blogs-indexed-everyday/

·         Scaling Flickr: http://radar.oreilly.com/archives/2006/04/database_war_stories_3_flickr.html

·         Scaling Craigslist: http://radar.oreilly.com/archives/2006/04/database_war_stories_5_craigsl.html

·         Scaling Findory: http://radar.oreilly.com/archives/2006/05/database_war_stories_8_findory_1.html

·         MySpace 2006: http://sessions.visitmix.com/upperlayer.asp?event=&session=&id=1423&year=All&search=megasite&sortChoice=&stype=

·         MySpace 2007: http://sessions.visitmix.com/upperlayer.asp?event=&session=&id=1521&year=All&search=scale&sortChoice=&stype=

·         Twitter, Flickr, Live Journal, Six Apart, Bloglines, Last.fm, SlideShare, and eBay: http://poorbuthappy.com/ease/archives/2007/04/29/3616/the-top-10-presentation-on-scaling-websites-twitter-flickr-bloglines-vox-and-more

 

Thanks to Soumitra Sengupta for sending the Flickr and PoorButHappy pointer my way and to Jeremy Mazner for sending the MySpace references.

 

                                                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | Msft internal blog: msblogs/JamesRH

 

Monday, November 12, 2007 5:22:28 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Friday, November 09, 2007

Earlier in the week Dr. Tachi Yamada of the Bill and Melinda Gates Foundation presented the work they are doing on health care in developing countries.  Some years back Bill Gates gave a similar talk at Microsoft and it was an amazing presentation.  Partly due to the depth and breadth of Bill’s understanding of the world health care problem but what impressed me most was the effectiveness of applying business principles to a social problem.  Applying capital to the highest leverage opportunities.  Don’t just invest in the breakthrough but also in the social and political barriers to uptake.  Tailor the solution to the local environment.  Work on the supply chain.  Influence the economic factors that cause phara companies to invest in a given solution.

 

The same techniques that allow a company to find success in business can be applied to world healthcare.  I love the approach and Dr. Yamada’s talk this week followed a similar theme.  My rough notes follow.

 

                                                                                --jrh

 

·         Speaker: Dr. Tadataka (Tachi) Yamada

o   Excellent presentation. He quitely relays the facts without slides and just lays out a very compelling and very clear picture of their approach to health care.

·         About ½ the foundation focuses on health, ¼ on learning in the US, and ¼ on improving economic situation

·         1,000 babies will die during this talk.

·         Life expectancy: 50 in sub-Sahara and close to 80 here in North America

·         Bill “finally graduated” from Harvard last June and in his commencement address he said:

o   humanities great advancements are not the discovery of technology but the application of it to fight inequity.

·         $2T spent on healthcare in the US.  A few billion from Gates foundation won’t correct the lack of political will in how this is applied.  $2B will have a fundamental impact spent in the developing world. This is where we can have the greatest positive impact and that’s why the foundation focuses its healthcare resources in the developing world.

·         HIV battle is using prevention.  Lifetime cost of treatment makes it very expensive to battle via treatment.

o   Circumcision has been shown very effective in reducing the transmission of HIV.

o   Long term approach is vaccine (note that 25 years of research haven’t yet found this)

§  We’re investing $500m over 5 years in HIV vaccine research

·         We focus on all phases of taking science to improved health outcomes:

o   To science, then to local opinion, then to policy, and then to application.  Without cover all four, full impact will not be relized.

·         In developing world 70% of all care is private, often for profit, health care.

o   Individuals purchasing directly from pharmacies (e.g. Malaria treatment)

o   Basic point is that you need to understand the entire system (economics, policy, social factors, etc.)

·         Mass customization is required for global success in business AND also in not-for-profit. The same ideas apply.

·         Yamanda points out that bed nets are effective in the fight against Malaria but aren’t in heavy use. He shows how companies market products and argues that we need to do the same thing in public health care.  People have to want a treatment, people have to believe in it or it won’t work.

·         Peer reviews kill innovation.  Need innovators reviewing innovation. Standard peer review tends to seek out incremental improvements to existing systems. 

·         10m children lose their lives each year.  Must stay focused on the prize: reduced mortality.

·         Quote from one of his ex-managers: “If you aren’t keeping score, you are just practicing”

o   Metrics driven approaches are needed

·         Birth rates: 30% lack of control and 70% demand side problem.

·         We believe that a healthy pharmaceutical industry and believe in IP but need affordable prices in under developed world.

·         Pharma makes less than 1% of the profits in the developing world.  Selling at cost would drive volume and not impact the profit picture.

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | Msft internal blog: msblogs/JamesRH

 

Friday, November 09, 2007 5:55:30 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Ramblings
 Tuesday, November 06, 2007