James Hamilton's Blog RSS 2.0
 Monday, March 03, 2008

In past blog entries, I’ve talked of the impact of Flash on server-side systems.  On the client, flash SSDs can help with at least two different markets paradoxically at different ends of the cost spectrum: 1) economy low cost laptops, and 2) high performance laptops. At the low end, flash can help make less expensive systems in that hard disk drives are mechanical devices with motors and actuators and they have a price floor.  Even very small disks need to have a motor and an actuator so getting a HDD for much less than $50 is quite difficult.  For very low cost devices with very small storage requirements, a flash SSD can be cheaper than a disk of similar size. And, in addition to being cheaper than HDDs in very small form factors, flash SSDs also consume less power, are more durable, and can operate reliably in broader environmental conditions. Perhaps the prototypical inexpensive laptop is the, One Laptop Per Child project.  It uses NAND flash for persistent storage: http://wiki.laptop.org/index.php/Hardware_specification.

 

On the other end of the spectrum, high-end laptops where performance, light-weight, silence, and long battery life are all important factors, NAND flash SSDs again are becoming common.  Many high-end laptops are shipped with flash SSDs rather than depending on a HDD. Some examples: Samsung, Sony Vaio, Dell Lattitude, HP Compaq, Asus, and many others.

 

Flash SSDs are also emerging as a common choice in ruggedized laptops due to the broad environmental range within which flash SSDs operate reliably.  They are also becoming a common choice in Ultra Mobile PCs such as the Samsung Q1.

 

Flash SSDs are on track to be used in a broad percentage of the laptop market.  EE times, for example, estimates that flash SSDs will be the supplied storage media in 38% of laptop market:

 

The above is from: http://www.eetimes.com/showArticle.jhtml?articleID=204400359 (Jack Creasey sent the article my way). Looking at specifically corporate laptops, I would expect the penetration to be far higher than 38%.

 

                                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Monday, March 03, 2008 12:05:15 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Hardware
 Tuesday, February 26, 2008

The internet was designed in a different time at a different scale. It’s rare that a design continues to work at all when scaled multiple orders of magnitude so it remains impressive but there are issues. The blackholing of YouTube over the weekend showed one of them. Routing is fragile and open to administrative error and also certain forms of attack but this particular example was the more common one: human error.

 

Over the weekend, a decision was made in Pakistan took down Youtube for two hours.  Here’s what happened.  Pakistan Telecom received a government order to block Pakistani access to YouTube. They did this for their network (most of Pakistan) but also advertised this route incorrect route to their provider, PCCW, as well.  PCCW shouldn’t have accepted iy but did.  Since PCCW is big and hence fairly credible, the error propagated quickly throughout the world from there.

 

This issue was inconvenient but the same sort of attack can be constructed intentionally to disrupt access to a web sites or to direct users to a web site masquerading as another.

 

Perhaps the best detail is on the Renesys blog: http://www.renesys.com/blog/2008/02/pakistan_hijacks_youtube_1.shtml. Other sources: http://www.news.com/8301-10784_3-9878655-7.html, http://www.nytimes.com/2008/02/26/technology/26tube.html?_r=1&ref=business&oref=slogin.  

 

                        --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Tuesday, February 26, 2008 12:07:19 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Friday, February 22, 2008

I get several hundred emails a day, some absolutely vital and needing prompt action and some about the closest thing to corporate spam.  I know I’m not alone.  I’ve developed my own systems on managing the traffic load and, on different days, have varying degrees of success in sticking to my systems.   In my view, it’s important not to confuse “processing email” with what we actually get paid to do.  Email is often the delivery vehicle for work needing to be done and work that has been done, but email isn’t what we “do”. 

 

We all need to find ways of coping with all the email while still getting real work done and having a shot at a life outside of work.  My approach is fairly simple:

·         Don’t process email in real time or it’ll become your job.  When I’m super busy, I process email twice a day: early in the morning and again in the evening.  When I’m less heavily booked, I’ll try to process email in micro bursts rather than in real time.  It’s more efficient and allows more time to focus on other things.

·         Shut off email arrival sounds and the “new mail” toast or you’ll end up with 100 interruptions an hour and get nothing done but email.

·         I get up early and try to get my email down to under 10 each morning.  I typically fail but get close. And I hold firm on that number once a week.  Each weekend I do get down to less than 10 messages.  If I enter the weekend with hundreds of email items, I get to work all weekend. This is a great motivator to not take a huge number of unprocessed email messages into the weekend.

·         Do everything you can to process a message fully in one touch.  I work hard to process email once. As I work through it, I delete or respond to everything I can quickly. Those that really do require more work I divide into two groups: 1) those I will do today or, at the very latest by end of week, I flag with a priority and leave in my inbox for processing later in the day (many argue these should be moved to a separate folder and they may be right).  The longer-lived items go into my todo list and I remove them from my inbox.  Because I get my email down to under 10 each week and spend as much of my weekend as needed to do this, I’m VERY motivated to not have many emails hanging around waiting to be processed.  Consequently most email is handled up front as I see them and the big things are moved to the todo list. Very few are prioritized for handling later in the day.

·         I chose not to use rules to auto-file email. Primarily I found that if I sent email directly to another folder, I almost never looked at it. So I let everything come into my inbox and I deal with them very quickly and, for the vast majority, they will only be touched once.  If I really don’t even want to see them once, I just don’t subscribe or ask not to get them.

·         Set your draft folder to be your inbox.  With email systems that use a separate folder for unsent mail, there is risk that you’ll get a message 90% written and ready to be sent, get interrupted and then forget to send it.  I set my draft folder to be my inbox so I don’t lose unsent email.  Since my email is worked down to under 10 daily, I’ll find it there for sure before end of day.

·         Don’t bother with complicated folder hierarchies—they are time-consuming to manage. If you want to save something, save it in a single folder or simple folder hierarchy and let desktop search find it when you need it.  Don’t waste time filing in complex ways.

·         Finally, be realistic: if you can’t process at the incoming rate, it’ll just keep backing up indefinitely.  If you aren’t REALLY going to read it, then delete it or file it on the first touch.  Filing it has some value in that, should you start to care more in the future, you can find it via full text search and read it then. 

 

Jeff Johnson of MSN pointed out this excellent talk on email management called “Inbox Zero” by Merlin Mann: http://www.43folders.com/2007/07/25/merlins-inbox-zero-talk.  Merlin’s advice is good and he presents well.

 

                                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Friday, February 22, 2008 12:09:06 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Ramblings
 Wednesday, February 20, 2008

This isn’t directly related to high scale services or saving power in the data center but it’s a great video.  Bill Gates Last Days (6:54) from CES.

 

                                    --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Wednesday, February 20, 2008 12:10:52 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Ramblings

Yesterday, Data Center knowledge reported that Sun was working on a Cloud Platform to compete with Amazon AWS: Project Caroline.  The data behind the report comes from a upcoming Java One 2008 presentation by Sun Distinguished Engineer, Bob Scheifler.  The talk announcement and synopsis is posted at: http://research.sun.com/projects/caroline/ and, even better, the slides are already up at: http://developers.sun.com/learning/javaoneonline/2007/pdf/TS-1991.pdf.

 

The full functionality supported by Caroline is actually beyond that offered by Amazon AWS. Included is:

·         Virtualizes key resources such as network and compute and provides horizontally scaled pool for each

o   Programmatic control of resource allocation, increasing or decreasing without human interaction,

·         Java VMs (rather than offer fully general Virtual Machines as Amazon does with EC2, the Java APIs are offered as the only programming abstraction)

·         Identity provider

·         Eclipse based dev tools

·         ZFS file system with storage reservation, access controls, snapshots (with rollback) and quotas

·         Database (PostgreSQL)

·         Networking: VLAN control, VPN support, dynamic NAT, L4 and L7 load balancing, DNS config

 

Overall it looks pretty interesting – the concepts all look good. The true test and the measure of whether this will actually be an AWS competitors won’t come until the service is made available at scale.  Nonetheless, its great seeing more pay-as-you-go service offerings coming available.

 

                        --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Wednesday, February 20, 2008 12:09:57 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Tuesday, February 19, 2008

I’ve got nothing against for-fee software – that’s what has paid the bills around our home for more than 20 years. Nonetheless, when it comes to education, it’s hard not to love free.  Yesterday Microsoft announced a great program.  Universities and high schools can now make use of Microsoft professional development tools for games, cell phones, and enterprise applications for free. I think this is a wonderful program.

 

The press release  including a Bill Gates Interview: http://www.microsoft.com/presspass/features/2008/feb08/02-18DreamSpark.mspx?rss_fdn=Top%20Stories

 

Two other related articles:

·         TechCrunch: http://www.techcrunch.com/2008/02/18/microsoft-to-give-students-dev-software-for-free/

·         Merc: http://www.mercurynews.com/ci_8302312?source=rss_emailed

 

--jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Tuesday, February 19, 2008 12:12:59 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Ramblings
 Sunday, February 17, 2008

Yet another argument in favor of Degraded Operations Mode (http://mvdirona.com/jrh/perspectives/2008/01/22/DegradedOperationsMode.aspx) emerged last week.  All of Amazon AWS (S3, SimpleDB, Simple Queuing Service, EC2, etc.) down for several hours last week: http://mvdirona.com/jrh/perspectives/2008/02/15/DowntimeAmazonS3SimpleDBSQS.aspx. The outage was reportedly due to a authentication storm: http://www.highscalability.com/s3-failed-because-authentication-overload (Mike Neil sent this my way).

 

Remember, you’ll never have the capacity for the biggest load inrush and, no matter how hard you try, your capacity planning will continue to only slightly better than the weather report for next week. When you don’t know what’s coming, design systems to operate through adversity: Degraded Operations Mode.

 

                                    --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh 

Sunday, February 17, 2008 12:14:11 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Friday, February 15, 2008

I recently was in a meeting with several physicians. One of them reported a result I have always suspected, but at a magnitude I never would have guessed.  The core observation was that that 80% of medical diagnoses were incorrect. The other doctors in the room confirmed this number to be roughly consistent with their experience.  A less anecdotal support for this estimated high error rate is found in those cases where a “gold standard” diagnostic test is discovered where there previously wasn’t one.  What has been found in many of these cases is that upwards of 80% of the previous diagnoses were incorrect. Several examples were given from different disease populations where a gold standard test has emerged.

 

How could one of the best funded medical systems in the world possibly be misdiagnosing so many patients?  The speculation was that it’s a combination of two factors: 1) doctors have VERY little information on the patient, often having never seen them before and, if they have met in the past, it is usually only for an hour or so a year, and 2) insufficient diagnostic information is available. Tests take time, cost money, sometimes are misapplied (e.g. poor X-rays), and some medical issues lack affordable, and highly reliable tests.

 

At first glance this incredible inaccuracy is shocking and hard to accept but, upon reflection, I have seen similar problems in my distant past as a professional auto mechanic.  Misdiagnosis and incorrect parts replacement is common. Repeat, returning, and unsolved problems are not uncommon.  Automobiles are complex systems, but much less complex than human beings, so its believable that medicine sees the same problems in a more exaggerated form.  In the automotive world, expensive misdiagnoses are battled on two fronts.  This first is through high quality data acquisition and diagnostic equipment to pinpoint the problem.  The second approach is to move from a repair model to a parts replacement model. The size of the replaceable component is increasing, which both minimizes labor costs and reduces the likelihood of error (replacing large complex components as a whole normally succeeds at the cost of some wastage). This second technique doesn’t apply well to the medical world but the former does: collect massive amounts of information to improve diagnostic success rates.

 

I get two things out of this discussion: 1) taking an active role in the collection and management of your medical records is worth the investment, 2) be better informed (just as knowing a bit about a car can help you communicate symptoms to auto-mechanic and the same is true with medical issues), and 3) well-executed, diagnostic tests are the most important part of any diagnosis.  In fact, well executed diagnostic tests can be more important than the skill and experience level of the diagnosing physician.

 

We’ve recently announced HealthVault (http://www.healthvault.com/), a site supporting 1) health related  content and search, 2) a central data storage system for health related information, and 3) connectivity to monitoring devices such as blood glucose readers, blood pressure monitors, heart rate monitors.  More information on HealthVault is at: HealthBlog.  This is early stage work but the combination of a central data repository and automated health information gathering has huge potential.  Technical, social, and legal issues must be overcome to realize the full potential of this service, but if we found a way to directly acquire diagnostic data from hospitals and clinics, this service could become truly amazing.

 

                                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-C/1279, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Friday, February 15, 2008 10:42:46 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services

If you run a big service and claim to have never had down time you either 1) have close to zero customers or 2) are lying. It’s almost that simple. 

 

There is considerable concern that Amazons AWS service was down for several hours:

 

·         http://www.roughtype.com/archives/2008/02/amazons_s3_util.php

·         http://gigaom.com/2008/02/15/amazon-s3-service-goes-down/

·         http://www.centernetworks.com/amazon-s3-down-error

 

Thanks to Jeff Currier and Soumitra Sengupta who told me about the downtime as it was happening last week. The service was reported to be down at 4:30AM. At 10:17, they reported it was resolved.  There are a couple of lessons in here but the first is that internal IT goes down, high scale services go down, client systems fail, networks stop operating, power failures happen, etc.  That’s just the way it is.  You can spend to reduce these factors and you can try to take complete control of the IT infrastructure to avoid them impacting you.  Ironically, in my experience, those that take over and run the entire infrastructure typically do it at lower scale with less experience and have downtime as well.  These small scale services end up costing much more and yet deliver very little additional uptime.  You read about commodity priced, high scale services when they go down. For example, RIM was down last week.  But, the good ones really don’t go down that frequently.  High scale, commodity infrastructure is actually pretty solid and compares very well to vertical, control-all-aspects-of-the-IT-infrastructure approaches.  Amazon AWS generally has earned a pretty good reliability record.

 

The second lesson is perhaps the hardest to learn and the most important: customers need information. If a service goes down – actually, I should say, when a service goes down – you need to tell customers what is happening and set expectations on service restoration right away.  There is a temptation to hide the facts because, well, downtime is embarrassing.  Hiding it simply doesn’t work. When people don’t know what is happening, they assume the worst and think you are trying to hide something or aren’t responding properly. Tell them what is happening, invest resources in keeping them up to date with progress, and tell them when you expect to be back up.

 

It’s hard, it’s embarrassing, but this one matters more than any other.  Long after the downtime is forgotten, people will remember how you handled it. Transparency wins when it comes to service operation – customers who have decided to bet there jobs on your service need timely information for their customers.  If you embarrass your customers, they remember forever.  A little downtime is unfortunate and you need to be getting better all the time but that’s forgivable.  Just get them the information they need for their dependent businesses.

 

                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh 

Friday, February 15, 2008 12:15:07 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Wednesday, February 13, 2008

Google has published an interesting study of Mobile Search trends (sent my way by Tren Griffin).  In this study the authors looked at over 1M queries submitted to Google Mobile web search over the course of a one month period.   They found that the average search query was 2.56 words.  (This is surprisingly similar to the average desktop query at 2.6 and the average PDA query at 2.35).   They predictably found a uniform relationship between query length in characters and the length of time it took to enter it. The average query took 44.8 seconds including network interactions.  They estimate the overhead to be roughly 5 seconds, meaning the user is willing to spend nearly 39 seconds entering a query. This is amazingly high.  It gives an idea of how valuable the query results are if users are willing to take that long to enter it.  The researchers found less query diversity in the mobile world than the desktop world. The mobile click-through rate on queries was over 50%.

 

I also found it interesting that users are entering queries faster this year than the comparative data from 2005. The average query time fell from 66.3 seconds to 44.8 seconds (including communications overhead).  The paper speculates this is a combination of improved keyboards and a population more comfortable with using devices.

 

The full paper is available from: http://www.maryamkamvar.com/publications/KamvarBalujaComputerMagazine.pdf.

 

                                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-C/1279, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh 

Wednesday, February 13, 2008 12:15:49 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Sunday, February 10, 2008

I upgraded my Samsung SGH-i607 to Windows Mobile 6 earlier today.  I had held off upgrading until today having been told that Internet Connection Sharing doesn’t work on the AT&T Windows Mobile 6 build.  Actually, it’s even a bit of a hassle to make it work on Win Mobile 5 but it can be done on both WM5 and 6.  ICS isn’t actually removed from the WM6 build, it’s just not exposed in the user interface as was done with WM5 and, in WM6, there are security settings preventing it from operating.  So it is more work to enable it but not really all that much (details on the page referenced below).

 

Overall I’m happy with the upgrade.  I’ve added to my Blackjack Hack, Tip, Techniques & Utilities page to include WM6 installation instructions, application unlocking instructions, pointer to the Internet Connection Sharing enabling procedure, instruction on how to move email and IE temp files to the storage card, a higher information density home page, and a few utilities.  If you have accumulated other interesting tricks, send them my way.  For example, I’ve not yet managed to SIM unlock this one.

 

                                    --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Sunday, February 10, 2008 12:16:47 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Hardware
 Thursday, February 07, 2008

I was down at Amazon last week speaking at their Internal Developers conference.  It was a fun trip in that I got to catch up with a bunch of old friends – a great many of which seemed to be working on S3 these days.

 

I presented Designing and Deploying Internet Scale Services.  Essentially best practices on writing service-based applications. Additional detail can be found in the paper on which the talk was based: http://research.microsoft.com/~jamesrh/TalksAndPapers/JamesRH_Lisa.pdf.

 

                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Thursday, February 07, 2008 12:17:46 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Tuesday, February 05, 2008

A few months back I was in a debate about the value of shared code segments between virtual machines. In my view there is no question that shared code across VMs has some value but code is small compared to data so the impact will be visible but not fundamental. What follows is an inventory of a typical client-side systems.

 

This experiment was done on an IBM T43 laptop with 1GB of memory running Vista RTM, desktop search, Foldershare (it rocks), and Outlook.  Outlook was in use prior to and during the measurement.  The system has been running for three days since the last boot.  The summary stats are:

 

Classification

pages

Meg

%

Kernel:

65824

257.125

25%

User:

195913

765.2852

75%

Total:

261737

1022.41

Kernel Pages

Kernel Image:

7395

28.88672

11%

Kernel Pure Data: