Sunday, January 29, 2012

Don't be a show-off. Never be too proud to turn back. There are old pilots and bold pilots, but no old, bold pilots.

 

I first heard the latter part of this famous quote made by US Airmail Pilot E. Hamilton Lee back when I raced cars. At that time, one of the better drivers in town, Gordon Monroe, used a variant of that quote (with pilots replaced by racers) when giving me driving advice. Gord’s basic message was that it is impossible to win a race if you crash out of it.

 

Nearly all of us have taken the odd chance and made some decisions that, in retrospect, just didn’t make sense from a risk vs reward perspective. Age and experience clearly helps but mistakes still get made and none of us are exempt. Most people’s mistakes at work don’t have life safety consequences and their mistakes are not typically picked up widely by the world news services as was the case in the recent grounding of the Costa Concordia cruise ship. But, we all make mistakes.

 

I often study engineering disasters and accidents in the belief that understanding mistakes, failures, and accidents deeply is a much lower cost way of learning.  My last note on this topic was What Went Wrong at Fukushima Dai-1 where we looked at the nuclear release following the 2011 Tohuku Earthquake and Tsunami

 

Living on a boat and cruising extensively (our boat blog is at http://blog.mvdirona.com/) makes me particularly interested in the Costa Concordia incident of January 13th 2012. The Concordia is a 114,137 gross ton floating city that cost $570m when it was delivered in 2006. It is 952’ long, has 17 decks, and is power by 6 Wartsila diesel engines with a combined output of 101,400 horse power. The ship is capable of 23 kts (26.5 mph) and has a service speed of 21 kts. At capacity, it carries 3,780 passengers with a crew of 1,100.

 

From: http://en.wikipedia.org/wiki/Costa_Concordia_disaster:

 

The Italian cruise ship Costa Concordia partially sank on Friday the 13th of January 2012 after hitting a reef off the Italian coast and running aground at Isola del Giglio, Tuscany, requiring the evacuation of 4,197 people on board. At least 16 people died, including 15 passengers and one crewman; 64 others were injured (three seriously) and 17 are missing. Two passengers and a crewmember trapped below deck were rescued.

 

The captain, Francesco Schettino, had deviated from the ship's computer-programmed route in order to treat people on Giglio Island to the spectacle of a close sail-past. He was later arrested on preliminary charges of multiple manslaughter, failure to assist passengers in need and abandonment of ship. First Officer Ciro Ambrosio was also arrested.

 

It is far too early to know exactly what happened on the Costa Concordia and, because there was loss of life and considerable property damage, the legal proceedings will almost certainly run for years. Unfortunately, rather than illuminating the mistakes and failures and helping us avoid them in the future, these proceedings typically focus on culpability and distributing blame. That’s not our interest here. I’m mostly focused on what happened and getting all the data I could find on the table to see what lessons the situation yields.

 

A fellow boater, Milt Baker pointed me towards an excellent video that offers considerable data into exactly what happened in the final 1 hour and 30 min. You can find the video at: Grounding of Costa Concordia. Another interesting data source is the video commentary available at: John Konrad Narrates the Final Maneuvers of the Costa Concordia. In what follows, I’ve combined snapshots of the first video intermixed with data available from other sources including the second video.

 

The source data for the two videos above is a wonderful safety system called Automatic Identification System. AIS is a safety system required on larger commercial craft and also used on many recreational boats as well. AIS works by frequently transmitting (up to every 2 seconds for fast moving ships) via VHF radio the ships GPS position, course, speed, name, and other pertinent navigational data. Receiving stations on other ships automatically plot transmitting AIS targets on electronic charts. Some receiving systems are also able to plot an expected target course and compute the time and location of the estimated closest point of approach. AIS an excellent tool to help reduce the frequency of ship-to-ship collisions.

 

Since AIS data is broadcast over VHF radio, it is widely available to both ships and land stations and this data can be used in many ways. For example, if you are interested in the boats in Seattle’s Elliott Bay, have a look at MarineTraffic.com and enter “Seattle” as the port in the data entry box near the top left corner of the screen (you might see our boat Dirona there as well).

 

AIS data is often archived and, because of that, we have a very precise record of the Costa Concordia’s course as well as core navigational data as it proceeded towards the rocks. In the pictures that follow, the red images of the ship are at the ship’s position as transmitted by the Costa Concordia’s AIS system. The black line between these images is the interpolated course between these known locations. The video itself (Costa Concordia Interpolated.wmv) uses a roughly 5:1 time compression.

In this screen shot, you can see the Concordia already very close to the Italian Isol del Giglio. From the BBC report the Captain has said he turned too late (Costa Concordia: Captain Schettino ‘Turned Too Late’). From that article:

 

According to the leaked transcript quoted by Italian media, Capt Schettino said the route of the Costa Concordia on the first day of its Mediterranean cruise had been decided as it left the port of Civitavecchia, near Rome, on Friday.

 

The captain reportedly told the investigating judge in the city of Grosseto that he had decided to sail close to Giglio to salute a former captain who had a home on the Tuscan island. "I was navigating by sight because I knew the depths well and I had done this maneuver three or four times," he reportedly said.

 

"But this time I ordered the turn too late and I ended up in water that was too shallow. I don't know why it happened."

 

In this screen shot of the boat at 20:44:47 just prior to the grounding, you can see the boat turned to 348.8 degrees but the massive 114,137 gross ton vessel is essentially plowing sideways through the water on a course of 332.7 degrees. The Captain can and has turned the ship with the rudder but, at 15.6 kts, it does not follow the exact course steered with inertia tending to widen and straiten the intended turn. 

 

Given the speed of the boat and nearness of shore at this point, the die is cast and the ship is going to hit ground.

 

This screen shot was taken is just past the point of impact. You will note that it has slowed to 14.0 kts. You might also notice the Captain is turning aggressively to the starboard. He has the ship turned to a 8.9 degrees heading whereas the actual ships course lags behind at 356.2 degrees.

 

This screen shot is only 44 seconds after the previous one but the boat has already slowed from 14.0 kts to 8.1 and is still slowing quickly.  Some of the slowing will have come from the grounding itself but passengers report that they heard the boat hard astern after the grounding.

 

You can also see the captain has swung the helm over from the starboard course he was steering trying to avoid the rocks over to port course now that he has struck them. This is almost certainly in an effort to minimize damage. What makes this (possibly counter-intuitive) decision a good one is the ships pivot point is approximately 1/3 of the way back from the bow so turning to port (towards the shore) will actually cause the stern to rotate away from the rocks they just struck.

 

The ship decelerated quickly to just under 6.0 knots but, in the two minutes prior to this screen shot, it has only slowed a further 0.9 kts down to 5.1. There were reports of a loss of power on the Concordia. Likely what happened is ship was hard astern taking off speed until a couple of minutes prior to this screen shot when water intrusion caused a power failure. The ship is a diesel electric and likely lost power to its main prop due to rapid water ingress.

 

At 5 kts and very likely without main engine power, the Concordia is still going much too quickly to risk running into the mud and sand shore so the Captain now turns hard away from shore and he is heading back out into the open channel.

 

With the helm hard over the starboard with the likely assistance of the bow thrusters the ship is turning hard which is pulling speed off fairly quickly. It is now down to 3.0 kts and it continues to slow.

 

The Concordia is now down to 1.6 kts and the Captain is clearly using the bow thrusters heavily as the bow continues to rotate quickly. He has now turned to a 41 degree heading.

 

It now has been just over 29 min since the ship first struck the rocks. It has essentially stopped and the bow is being brought all the way back round using bow thrusters in an effort to drive the ship back in towards shore presumably because the Captain believes it is at risk of sinking so he is seeking shallow water.

 

The Captain continues to force the Concordia to shore under bow thruster power. In this video narrative (John Konrad Narrates the Final Maneuvers of the Costa Concordia), the commentator reported that the combination of bow thrusters and the prevailing currents where being used in combination by the Captain to drive the boat into shore.

 

A further 11 min and 22 seconds have past and the ship has now accelerated back up to 0.9 kts now heading towards shore.

 

It has been more than an hour and 11 minutes since the original contact with the rocks and the Costa Concordia is now at rest in its final grounding point.

 

The Coast Guard transcript of the radio communications with the Captain are at Costa Concordia Transcript: Coastguard Orders Captain to return to Stricken Ship. In the following text De Falco is the Coast Guard Commander and Schettino is the Captain of the Costa Concordia:

 

De Falco: "This is De Falco speaking from Livorno. Am I speaking with the commander?"

Schettino: "Yes. Good evening, Cmdr De Falco."

De Falco: "Please tell me your name."

Schettino: "I'm Cmdr Schettino, commander."

De Falco: "Schettino? Listen Schettino. There are people trapped on board. Now you go with your boat under the prow on the starboard side. There is a pilot ladder. You will climb that ladder and go on board. You go on board and then you will tell me how many people there are. Is that clear? I'm recording this conversation, Cmdr Schettino …"

Schettino: "Commander, let me tell you one thing …"

De Falco: "Speak up! Put your hand in front of the microphone and speak more loudly, is that clear?"

Schettino: "In this moment, the boat is tipping …"

De Falco: "I understand that, listen, there are people that are coming down the pilot ladder of the prow. You go up that pilot ladder, get on that ship and tell me how many people are still on board. And what they need. Is that clear? You need to tell me if there are children, women or people in need of assistance. And tell me the exact number of each of these categories. Is that clear? Listen Schettino, that you saved yourself from the sea, but I am going to … really do something bad to you … I am going to make you pay for this. Go on board, (expletive)!"

Schettino: "Commander, please …"

De Falco: "No, please. You now get up and go on board. They are telling me that on board there are still …"

Schettino: "I am here with the rescue boats, I am here, I am not going anywhere, I am here …"

De Falco: "What are you doing, commander?"

Schettino: "I am here to co-ordinate the rescue …"

De Falco: "What are you co-ordinating there? Go on board! Co-ordinate the rescue from aboard the ship. Are you refusing?"

Schettino: "No, I am not refusing."

De Falco: "Are you refusing to go aboard, commander? Can you tell me the reason why you are not going?"

Schettino: "I am not going because the other lifeboat is stopped."

De Falco: "You go aboard. It is an order. Don't make any more excuses. You have declared 'abandon ship'. Now I am in charge. You go on board! Is that clear? Do you hear me? Go, and call me when you are aboard. My air rescue crew is there."

Schettino: "Where are your rescuers?"

De Falco: "My air rescue is on the prow. Go. There are already bodies, Schettino."

Schettino: "How many bodies are there?"

De Falco: "I don't know. I have heard of one. You are the one who has to tell me how many there are. Christ!"

Schettino: "But do you realize it is dark and here we can't see anything …"

De Falco: "And so what? You want to go home, Schettino? It is dark and you want to go home? Get on that prow of the boat using the pilot ladder and tell me what can be done, how many people there are and what their needs are. Now!"

Schettino: "… I am with my second in command."

De Falco: "So both of you go up then … You and your second go on board now. Is that clear?"

Schettino: "Commander, I want to go on board, but it is simply that the other boat here … there are other rescuers. It has stopped and is waiting …"

De Falco: "It has been an hour that you have been telling me the same thing. Now, go on board. Go on board! And then tell me immediately how many people there are there."

Schettino: "OK, commander."

De Falco: "Go, immediately!"

 

At least 16 died in the accident and 17 were still missing when this was written (Costa Concordia Disaster).The Captain of the Costa Concordia, Francesco Schettino, has been charged with manslaughter and abandoning ship.

 

At the time of the grounding, the ship was carrying 2,200 metric tons of heavy fuel oil and 185 metric tons of diesel and remains environmental risk remains (Costa Concordia Salvage Experts Ready to Begin Pumping Fuel from Capsized Cruise Ship Off Coast of Italy). The 170 year old salvage firm Smit Salvage will be leading the operation.

 

All situations are complex and few disasters have only a single cause. However, the facts as presented to this point pretty strongly towards pilot error as the primary contributor in this event.  The Captain is clearly very experienced and his ship handling after the original grounding appear excellent. But, it’s hard to explain why the ship was that close to the rocks, the captain has reported that he turned too late, and public reports have him on the phone at or near the time of the original grounding.

 

What I take away from the data points presented here is that experience, ironically,  can be our biggest enemy. As we get increasingly proficient at a task, we often stop paying as much attention. And, with less dedicated focus on a task, over time, we run the risk of a crucial mistake that we probably wouldn’t have made when we were effectively less experienced and perhaps less skilled. There is danger in becoming comfortable.

 

The videos referenced in the above can be found at:

·         Grounding of Costa Concordia Interpolated

·         gCaptain’s John Konrad Narrates the Final Maneuvers of the Costa Concordia

 

If you are interested in reading more:

·         http://www.masslive.com/news/index.ssf/2012/01/costa_concordia_salvage_expert.html

·         http://www.bbc.co.uk/news/world-europe-16620807

·         http://www.washingtonpost.com/world/divers-in-grounded-costa-concordia-112/2012/01/25/gIQAOkD2PQ_video.html

·         http://www.bbc.co.uk/news/world-europe-16620807

·         http://www.foxnews.com/slideshow/world/2012/01/14/luxury-ship-runs-aground-off-italy-bodies-found/#slide=22

·         http://www.telegraph.co.uk/news/worldnews/europe/italy/9042826/Wife-of-Costa-Concordia-captain-says-it-is-not-for-those-on-land-to-judge-her-husband.html

·         http://www.telegraph.co.uk/news/interactive-graphics/9018076/Concordia-How-the-disaster-unfolded.html

·         http://news.qps.nl.s3.amazonaws.com/Grounding+Costa+Concordia.pdf

·         http://www.bellenews.com/2012/01/14/world/europe-news/italian-captain-of-costa-concordia-cruise-ship-has-been-arrested/

·         http://en.wikipedia.org/wiki/Costa_Concordia

·         http://en.wikipedia.org/wiki/Costa_Concordia_disaster

 

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

 

Sunday, January 29, 2012 11:24:25 AM (Pacific Standard Time, UTC-08:00)  #    Comments [6] - Trackback
Ramblings
 Thursday, January 26, 2012

Ordinarily I focus this blog on areas of computing where I spend most of my time from high performance computing to database internals and cloud computing. An area that interests me greatly  but I’ve seldom written about is entrepreneurship and startups.

 

One of the Seattle areas startups with which I stay in touch is Socrata. They are focused on enabling federal, state, and local governments to improve the reach, usability and social utility of their public information assets.  Essentially making public information available and useful to their constituents. They are used by: the World Bank, the United Nations, the World Economic Forum, the US Data.Gov, Health & Human Services, Centers for Disease Control, several most major cities including NYC, Seattle, Chicago, San Francisco and Austin and many county and state governments. Even foreign governments like the Country of Kenya have adopted Socrata.

 

I first met Kevin Merritt, the founder and CEO of Socrata, back in 2005 when I was doing technical diligence for the Microsoft acquisition of the LA-based Frontbridge Technologies. I love doing diligence on startups because it’s an opportunity to dive in and spend a day or more digging deeply and understanding what smart people have produced, where things worked really well, and areas where things didn’t pan out as well as they could have. I’ve learned a lot in these roles and I’m  lucky to have been able to do many of them first at IBM, later at Microsoft, and now at Amazon.

 

What made this one a bit different is I got a call shortly after the deal closed asking if I wanted to be the General Manager of the Microsoft subsidiary that was formed in the acquisition. An opportunity to run mid-sized business in its entirety. Development, test, operations, and customer support. Absolutely! I’ve never learned so much as I did in the first year or so at what would become Microsoft Exchange Hosted Services.

 

It was a great experience and I’ve been 100% focused on cloud services since that time. And, as a consequence of leading Frontbridge, I got to know Kevin Merritt well. He is an excellent strategic thinker and an even better operator. Whenever Kevin was involved, customers were happy and the service was rapidly improving and expanding.  Kevin eventually left to form Socrata and he and I have stayed in touch since then. He knows I’m a sucker for a beer and some wings :-).

 

Based in Seattle, Socrata is venture-backed with a small and talented engineering team.  They are enjoying strong customer demand and their market success is fueling growth in the engineering team. They are currently looking for a CTO and, if I didn’t already have one of the best job out there, I would seriously considering joining Kevin and the team.  If you are a technology leader interested in big data, cloud computing, architecture of distributed systems, ops automation, and the user experience of making data easy to find and use, you should send Kevin, their founder and CEO, a note at kevin.merritt@socrata.com.

 

                                                                --jrh

 

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

Thursday, January 26, 2012 5:13:24 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Ramblings
 Monday, November 14, 2011

Last week I got to participate in one of my favorite days each year, serving on the judging panel for the AWS Startup Challenge. The event is a fairly intense day where our first meeting starts at 6:45am and the event closes at 9pm that evening. But it is an easy day to love in that the entire day is spent with innovative startups who have built their companies on cloud computing.

 

I’m a huge believer in the way cloud computing is changing the computing landscape and that’s all I’ve worked on for many years now. But I have still not tired of hearing “Without AWS, we wouldn’t have even been able to think about launching this business.”

 

Cloud computing is allowing significant businesses to be conceived and delivered at scale with only tiny amounts of seed funding or completely bootstrapped. Many of the finalist we looked at last week’s event had taken less than $200k of seed funding and yet had already had thousands of users. That simply wouldn’t have been possible 10 years ago and I just love to see it.

 

The finalist for this year’s AWS Startup Challenge were:

Booshaka - United States (Sunnyvale, California)

Booshaka simplifies advocacy marketing for brands and businesses by making sense of large amounts of social data and providing an easy to use software-as-a-service solution. In an era where people are bombarded by media, advertisers face significant challenges in reaching and engaging their customers. Booshaka combines the social graph and big data technology to help advertisers turn customers into their best marketers.

Deputy.com - Australia (Sydney)

Deputy.com is an online business management solution specifically addressing the HR department. The powerful online and mobile platform engages all staff across an enterprise, builds positive culture and drives business growth.

Fantasy Shopper - UK (Exeter)

Fantasy Shopper is a social shopping game. The shopping platform centralizes, socializes and “gamifies” online shopping to provide a real-world experience.

Flixlab - United States (Palo Alto, California)

With Flixlab, people can instantly and automatically transform raw videos and photos from their smartphone or their friends’ smartphones, into fun, compelling stories with just a few taps and immediately share them online. After creation, viewers can then interact with these movies by remixing them and creating personally relevant movies from the shared pictures and videos.

Getaround - United States (San Francisco, California)

Getaround is a peer-to-peer car sharing marketplace that enables car owners to rent their cars to qualified drivers by the hour, day, or week. Getaround facilitates payment, provide 24/7 roadside assistance, and provide complete insurance backed by Berkshire Hathaway with each rental.

Intervention Insights - United States (Grand Rapids, Michigan)

Intervention Insights provides a medical information service that combines cutting edge bioinformatics tools with disease information to deliver molecular insights to oncologists describing an individual’s unique tumor at a genomic level. The company then provides a report with an evidenced-based list of therapies that target the unique molecular basis of the cancer.

Localytics - United States (Cambridge, Massachusetts)

Localytics is a real-time mobile application analytics service that provides app developers with tools to measure usage, engagement and custom events in their applications. All data is stored at a per-event level instead of as aggregated counts. This allows app publishers, for example, to create more accurately targeted advertising and promotional campaigns from detailed segmentation of their dedicated customers.

Judging this year’s competition was even more difficult than last year because of the high quality of the field. Rather than a clear winner just jumping out, nearly all the finalist were viable winners and each clearly led in some dimensions.

 

As I write this and reflect on the field of finalist, some notable aspects of the list: 1) it is truly international in that there are several very strong entrants from outside the US and more than ½ of the finalists come from outside of Silicon Valley – the combination of two trends is powerful: first the economics of cloud computing supports successful startups without venture funding and, second, the spread of venture and angel funding throughout the world. Both trends make for a very strong field. Continuing on the notable attributes list, 2) very early stage startups are getting traction incredibly quickly – cloud computing allows companies to go to beta without having to grow a huge company. And, 3) Diversity. There were consumer offerings, developer offerings, and services aimed at highly skilled professionals.

The winner of the AWS Startup Challenge this year was Fantasy Shopper from Exeter, United Kingdom. Fantasy Shopper is a small, mostly bootstrapped startup led by CEO Chris Prescott and CTO Dan Noz with two other engineers. Fantasy Shopper is a social shopping game. They just went into beta on October 18th and already have thousands of incredibly engaged users. My favorite example of which is this video blog posted to YouTube November 6th: http://www.youtube.com/watch?v=h_sKDgdEexk. Watch the first 60 to 90 seconds and you’ll see what I mean.

 

Congratulations to Chris, Dan, Brendan, and Findlay at Fantasy Shopper and keep up the great work.

 

                                                                --jrh

 

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

 

Monday, November 14, 2011 7:24:52 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2] - Trackback
Ramblings
 Monday, October 31, 2011

I’m not sure why it all happens at once but it often does.  Last Monday I kicked off HPTS 2011 in Asilomar California and then flew to New York City to present at the Open Compute Summit. 

 

I love HPTS. It’s a once every 2 year invitational workshop that I’ve been attending since 1989. The workshop attracts a great set of presenters and attendees: HPTS 2011 agenda. I blogged a couple of the sessions if you are interested:

·         Microsoft COSMOS at HPTS

·         Storage Infrastructure Behind Facebook Message

 

The Open Compute Summit was kicked off by Frank Frankovsky of Facebook followed by the legendary Andy Bechtolsheim of Arista Networks.  I did a talk after Andy which was a subset of the talk I had done earlier in the week at HPTS.

·         HPTS Slides: Internet-Scale Data Center Economics: Costs and Opportunities

·         OCP Summit: Internet Scale Infrastructure Innovation

 

Tomorrow I’ll be at University of Washington presenting Internet Scale Storage at the University of Washington Computer Science and Engineering Distinguished Lecturer Series (agenda). Its open to the public so, if you are in the Seattle area and interested, feel free to drop by EEB-105 at 3:30pm (map and directions).

 

                                                                --jrh

 

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

Monday, October 31, 2011 6:07:27 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Ramblings
 Saturday, October 29, 2011

Sometimes the most educational lessons are on what not to do rather than what to do. Failure and disaster can be extraordinarily educational as long as the reason behind the failure is well understood.  I study large system outages, infrastructure failures, love reading post mortems (when they actually have content), and always watch carefully how companies communicate with their customers during and right after large scale customer impacting events. I don’t do it because I enjoy failure – these things all scare me. But, in each there are lessons to be learned.

 

 

Sprint advertising from: http://unlimited.sprint.com/?pid=10 (2011/10/29).

 

I typically point out the best example rather than the worst but every once in a while you see a blunder so big it just can’t be ignored.   Sprint is the 3rd place wireless company in an industry where backbreaking infrastructure costs strongly point towards there only being a small number of surviving companies unless services are well differentiated.  All the big wireless players work hard on differentiation but it’s a challenge and, over time, the biggest revenue, supports the biggest infrastructure investment, and its gets harder and harder to be successful as a #3 player.

 

Sprint markets that they are better than the #1 and #2 carrier because they really have unlimited data rather than merely using the word “unlimited” in the marketing literature. They say “at Sprint you get unlimited data, no overage charges, and no slowing you down” (http://unlimited.sprint.com/?pid=10).

 

We live on a boat and so 4G cellular is about as close as we can get to broadband. I like to do all I can to encourage broad competition because it is good for the industry and good for customers.  That is one of the reasons we are Sprint customers today.  We use Sprint because they offer unlimited 4G and I really would like there to be more than 2 surviving North American wireless providers.

 

Unfortunately, Sprint seems less committed to my goal of keeping the #3 wireless player healthy and solvent.  Looking at the Sprint primary differentiating feature, unlimited data, they plan to shut it off this month. That’s a tough decision but presumably it was made with care and there exists a plan to grow the company with some other feature or features making them a better choice than Verizon or AT&T.  Just being a 3rd choice with a less well developed network and with less capital to invest into that network doesn’t feel like a good long term strategy for Sprint.

 

What makes Sprint’s decision notable is the way the plan was rolled out. Sprint has many customers under 2 year, unlimited data contracts. Rather than risk the negative repercussions and customer churn from communicating the change, they went the stealth route.  The only notification was buried in the fine print of the October bill:

 

Mobile Broadband Data Allowance Change
Eff. on your next bill, Mobile Broadband Data Plan 4G usage will be combined with your current 3G monthly data allowance and no longer be unlimited. On-network data overage rate for 3G/4G is $.05/MB. Monitor combined data usage at sprint.com. Please visit sprint.com/servicechange for details.

In November, many of us are going to get charged an overage fee of $0.05/MB on what has been advertised heavily as the only “real” unlimited plan.  For many customers, the only reason they have a Sprint contract is that the data plan was uncapped. Both my phone and Jennifer’s are with AT&T. The only reason we are using Sprint for connectivity from the boat WiFi system is Sprint offered unbounded data. Attempting a stealth change of the primary advertised characteristic of a service shows very little respect for customers even when compared with other telcos, an industry not generally known for customer empathy.

 

I agree that almost nobody is going to read the bill and I suppose some won’t notice when subsequent bills are higher. But many eventually will. And, even for those that don’t notice and are silently are getting charged more, when they do notice, they are going to be unhappy. No matter how you cut it, the experience is going to be hard on customer trust. And, at the same time they showing little respect for customers, they are releasing them all from contract at the same time. Any Sprint customer is now welcome to leave without termination charge.

 

Some analysts have speculated that Sprint doesn’t have the bandwidth to support their launch of iPhone.  This billing structure change strongly suggests that Sprint really does have a bandwidth problem. I’ve still not yet figured out why an iPhone is more desirable at Sprint than it is at Verizon or AT&T. And I still can’t figure out why the #3 provider with the same data caps is more desirable than the big 2 but it’s not important that I understand. That’s a Sprint leadership decision.

 

Let’s assume that the Sprint network is in capacity trouble and they have no choice but to cap the data plans even though they are changing the very terms they advertised as their primary advantage. Even if that is necessary, I’m 100% convince the right way to do it is to support the existing contact terms for the duration of those contracts. If the company really is teetering on failure and is unable to honor the commitments they agreed to, then they need to be upfront with customers. You can’t slip in new contract terms quietly into the statement and hope nobody notices. Showing that little respect for customers is usually rewarded by high churn rates and a continuing to shrink market share.  Poor approach.

 

I called Sprint and pointed out they were kind of missing the original contact terms. They said “there was nothing they could do” however, they would be willing to offer a $100 credit if we would agree to another 2 year contract term. Paying only $100 to get a customer signed up for another 2 years would be an incredible bargain for Sprint. Most North American carriers spend at least that on device subsidies when getting customers committed to an additional 2 year term. This would be cheap for Sprint and would get customers back under contract after this term change effectively released them. The Sprint customer service representative did correctly offer to waive early cancellation fees since they were changing the contract terms of the original contract. 

 

Sprint customers are now all able to walk away today from the remaining months in their wireless contacts without any cost. They are all free to leave. From my perspective, it is just plain nutty for Sprint to give their entire subscription base the freedom to walk away from contracts without charge while, at the same time, treating them poorly. It’s a recipe for industry leading churn.

 

                                                --jrh

 

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

 

Saturday, October 29, 2011 6:40:18 AM (Pacific Standard Time, UTC-08:00)  #    Comments [9] - Trackback
Ramblings
 Thursday, June 09, 2011

The Amazon Technology Open House was held Tuesday night at the Amazon South Lake Union Campus. I did a short presentation on the following:

 

       Quickening pace of infrastructure innovation

       Where does the money go?

       Power distribution infrastructure

       Mechanical systems

       Modular & Advanced Building Designs

       Sea Change in Networking

 

Slides and notes:

 

                                                 --jrh

 

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

 

Thursday, June 09, 2011 5:40:21 AM (Pacific Standard Time, UTC-08:00)  #    Comments [4] - Trackback
Ramblings
 Friday, June 03, 2011

Earlier today Alex Mallet reminded me of the excellent writing of Atul Gawnade by sending me a pointer to the New Yorker coverage of Gawande’s commencement address at the Harvard Medical School: Cowboys and Pit Crews.

 

Four years ago I wrote a couple of blog entries on Gawande’s work but, at the time, my blog was company internal so I’ve not posted these notes here in the past:

 

As a follow-on to the posting I made on professional engineering (also posted externally http://perspectives.mvdirona.com/2007/11/07/ProfessionalEngineering.aspx) Edwin Young sent me a link to the following talk by Atul Gawande: Outcomes are very Personal. It’s from another domain, medicine, but is a phenomenally good presentation by a surgeon and his core premise applies equally to software: practitioners work and the outcomes of that work are spread on a bell curve.  The truly great are much better than the average and often an order of magnitude better than the lowest performing.  His book and the presentation is about the personal attributes and approaches of those at the very top. It’s well worth a view: http://www.youtube.com/watch?v=MbNu6LY5sMY.

 

In my view, it’s an insightful presentation by a surgeon who loves data, loves understanding why we do well and how we can do better and is relentless in pursuit himself of doing everything better.  Subsequent to watching the presentation, I read a book by the same author “Better: A Surgeon's Notes on Performance” (http://www.amazon.com/Better-Surgeons-Performance-Atul-Gawande/dp/0805082115). 

 

Software, like surgery, is part art and part science and there is tremendous variability between the average and the best.  Gawande studies the best in different specializations to understand why the performance of some practitioners is way out there at the positive end of the bell curve and, through a series of essays, makes observations on how to do improve performance of the population Aoverall.  Understanding that human performance is distributed on the bell curve means that, for whatever it is you are doing, there are average performers, terrible performers, and truly gifted performers.  Gawade looks for what he calls positive deviance – it’s always there in a bell curve distributed phenomena – and tries to understand what they do differently.  Worth reading.

 

A sampling of Gawande’s work:

·         Book: http://www.amazon.com/Better-Surgeons-Performance-Atul-Gawande/dp/0805082115

·         Video: http://www.youtube.com/watch?v=MbNu6LY5sMY

·         Commencement Speech Notes: http://www.newyorker.com/online/blogs/newsdesk/2011/05/atul-gawande-harvard-medical-school-commencement-address.html

 

                                                --jrh

 

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

 

Friday, June 03, 2011 7:24:06 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Ramblings
 Tuesday, May 31, 2011

As a boater, there are times when I know our survival is 100% dependent upon the weather conditions, the boat, and the state of its equipment. As a consequence, I think hard about human or equipment failure modes and how to mitigate them. I love reading the excellent reporting by the UK Marine Accident Investigation Board. This publication covers human and equipment related failures on commercial shipping, fishing, and recreational boats. I read it carefully and I’ve learned considerably from it.

 

I treat my work in much the same way. At work, human life is not typically at risk but large service failures can be very damaging and require the same care to avoid. As a consequence, at work I also think hard about possible human or equipment failure modes and how to mitigate them.

 

Wanting to deeply understand unusual failure modes and especially wanting to understand the errors that humans make when managing systems under stress, I spend time reading about system failures. Considerable learning can be drawn from reading about the failures of engineered systems and people under stress. All disasters or near disasters yield some unique lessons and re-enforce some old ones.

 

The hard part for me is getting enough detail to really learn from the situation. The press reports are often light on details partly because general audiences are not necessarily that interested but there also may be legal or competitive constraints preventing broad publication. NASA, FAA, Coast Guard and some other government reports to get to excellent detail. One analysis of system failure I learned greatly from was Feynman’s analysis of the space shuttle Challenger disaster as part of the Rogers Commission Report.

 

I just came across another report that is not quite a Feynman  classic but it is an excellent, just-the-facts description of a large scale failure. This report, from IEEE Spectrum, titled What Went Wrong in Japan’s Nuclear Reactors outlines what happened in the eventually catastrophic disaster at Japan’s Fukushima Dai-1 nuclear facility following the Tohoku earthquake and subsequent tsunami. In this report, the terminal failures of 4 of the 6 reactors at the facility is described in more detail than other accounts of that event I’ve come across.

All disasters are unique in some dimensions. What makes Fukushima particularly unusual is these failures occurred over multiple weeks rather than the seconds to hours of many events.  This one was relatively slow to develop and even slower to be brought under control. Looking forward, I suspect Fukushima will share some characteristics with Chernobyl where mitigating the environmental damage is still nowhere close to complete nearly three decades later. In 1998 the Ukraine government obtained economic aid from the European Bank for Reconstruction and Development to rebuild the failing Chernobyl sarcophagus. It is expected that yet more work will need to be done to continue to contain dangerous radioactive substances from escaping.  Similarly, I expect the environmental impact of the Fukushima disaster will be fought for decades at great cost both economic and human.

 

In many ways Fukishima was a classic disaster where a not particularly surprising event, in this case an earthquake near Japan, started the failure and then cascading natural disaster, equipment failure, and human decisions followed to yield an outcome that every aspect of the system design sought to avoid.

 

I recommend reading the IEEE report linked below and my rough notes from the write-up follow:

·         On March 11 an earthquake registering 9.0 magnitude was experienced off the coast of Japan

·         The tsunami hit the plant destroying power distribution gear cutting off power to the Fukushima facility

·         Backup generators and switch gear were also disabled by the Tsunami

·         Reactor building integrity was maintained through earthquake and Tsunami and the three reactors that were active at that point where all shut down properly

·         Due to the power failure and the damage to distribution gear and generators, plant cooling systems were not operating at any of the reactors nor the spent fuel rod storage pools

·         Even though the nuclear reaction had been stopped in the three reactors that were operational when the tsunami hit (reactors 1, 2, & 3), considerable heat was still being created putting the reactors at risk of meltdown. Meltdown is a condition where reactor core over temperature occurs, the coolant is boiled off, the fuel rods melt and form a pool of very hot, highly radioactive fuel in the bottom of the reactor. This hot, radioactive fluid then rapidly breaks down steel and concrete in the containment vessel and possibly escapes to the environment.

·         Another area of risk from the failed cooling systems are the spent nuclear fuel rod storage pools. These pools are also housed inside the reactor buildings near the primary containment vessel where the active nuclear reaction actually takes place. Although the fuel rods are no longer contributing to a nuclear reaction, they are both highly radioactive and still producing sufficient heat that active cooling is required. Without cooling these rods can heat the storage pool to the point that it boils off the cooling water and present a risk similar to the active rods inside the primary storage vessel.

·         I find it surprising that both the spent rod storage and the shut down reactor cores don’t appear to fail safe and self-stabilize when cooling water is removed given the considerably higher than zero probability of power failure and the seriously negative impact of radioactive release to the environment.

·         Events at Reactor #1:

o   March 12, a day after the power failure, heat in the recently shutdown reactor built up until the (not circulating) cooling water began to be boiled off.

o   As the water level fell, the now exposed fuel rods reacted with the steam in the primary containment vessel, and began producing hydrogen gas

o   The pressure rose to dangerous levels in the primary containment vessel and operators decided to vent the primary containment vessel into the reactor building.

o   The vented hydrogen gas when exposed to the relatively oxygen-rich environment in the reactor building, exploded blowing the top off the reactor building

o   The explosion may have also damaged the primary containment vessel and definitely released radioactive material

o   The operators chose to pump seawater into the building in an effort to control the escalating temperature inside the reactor and to avoid core meltdown

o   March 29, radioactive water was found outside the reactor building

o   April 5, reactor core temperatures have begun to fall indicating the system is coming back into control

o   Radioactivity levels in the building are very high and operators are injecting nitrogen to reduce the likelihood of subsequent hydrogen explosions.

o   May 12, TEPCO officials confirmed that the reactor had suffered a core meltdown and the bottom of the reactor building may be leaking highly radioactive water into the environment.

·         Events at Reactor #3:

o   March 14, 3 days after the tsunami and 2 days after the roof was blown off the Reactor #1 containment building, the same thing happened on Reactor #3

o   This explosion occurred despite plant operators pumping large quantities of cooling sea water into the reactor building

o   March 17, steam begins billowing from the reactor building confirming that the primary containment vessel was damaged and releasing radioactive compounds.

o   Helicopters dumped water on the building and police water cannons were used to pour water down onto the building.

o   Water was sprayed on the building for days with some interruptions as radiations levels rose sufficiently high that work had to be stopped.

o   March 24, workers laying power cables attempting to restore power to Reactor #3 waded into highly radioactive water requiring hospitalization.

o   March 28, dangerous plutonium was detected in the environment near Reactor #3.

·         Events at Reactor #2:

o   March 15, 4 days after the tsunami, 3 days after the roof was blown off Reactor #1, and a day after the roof was blown off Reactor #3, a serious explosion occurred at Reactor #2.

o   Reactor #2 was later confirmed to have experienced at least a partial core meltdown

o   March 27, highly radioactive water discovered outside of reactor building #2.

·         Subsequently large quantities of uncontained radioactive water has been found throughout the multi-reactor plan and the turbine facilities are flooded as are the cabling tunnels between the buildings. Serious radioactive water leaks into the ocean have been detected and subsequently corrected in one case by injecting 6,000 liters of liquid glass into the ground near the leak.

·         April 4th, 11,500 tons of radioactive water is pumped into the ocean. This water is 100x above the legal safety limit but was pumped into the environment in the hope that the storage facilities can be used to contain waste water that is 10,000x time radioactive limit for environmental release.

·         The spent fuel pools at the inactive reactors 4, 5, & 6 were all slowly overheating as a consequence of there being no cooling water. The Reactor #4 cooling pool either boiled off its water or it leaked off as a result of earthquake damage. The spent fuel rods exposed to atmosphere without cooling lead to fires inside Reactor building #4

·         Outcome:

o   Fukushima now rated to be as serious as the Chernobyl having been classified as a a magnitude 7 event, the worst on the International Nuclear Event Scale. However it is still consider to have released only 5 to 10% of the radiation released by Chernobyl.

o   All residents within 20 km evacuated

o   Voluntary evacuation of all residents between 20 and 30 km.

o   Agricultural products including milk and vegetables from the region contaminated

o   Tokyo’s tap water declared unfit for infants for 1 day

o   Decades of cleanup and containment remain

 

The report: What Went Wrong in Japan's Nuclear Reactors: http://spectrum.ieee.org/tech-talk/energy/nuclear/explainer-what-went-wrong-in-japans-nuclear-reactors.

 

We all wish the situation had been avoided and, those of us involved in engineering projects whether they be life critical systems or not, need to ensure that the lessons from this one are learned well and applied faithfully to new designs. I won’t speculate on human risk in the efforts spent to mitigate this disaster but, clearly, the workers that brought these systems back under control and continue to manage the environmental impact are heroes and deserve our collective thanks. 

 

                                                                --jrh

 

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

 

 

Tuesday, May 31, 2011 5:39:33 AM (Pacific Standard Time, UTC-08:00)  #    Comments [8] - Trackback
Ramblings
 Tuesday, September 14, 2010

For those of you writing about your work on high scale cloud computing (and for those interested in a great excuse to visit Anchorage Alaska), consider submitting a paper to the Workshop on Data Intensive Cloud Computing in the Clouds (DataCloud 2011). The call for papers is below.

 

                                                                --jrh

 

-------------------------------------------------------------------------------------------

                                           *** Call for Papers ***
      WORKSHOP ON DATA INTENSIVE COMPUTING IN THE CLOUDS (DATACLOUD 2011)
                  In conjunction with IPDPS 2011, May 16, Anchorage, Alaska
                                
http://www.cct.lsu.edu/~kosar/datacloud2011
-------------------------------------------------------------------------------------------

The First International Workshop on Data Intensive Computing in the Clouds (DataCloud2011) will be held in conjunction with the 25th IEEE International Parallel and Distributed Computing Symposium (IPDPS 2011), in Anchorage, Alaska.

Applications and experiments in all areas of science are becoming increasingly complex and more demanding in terms of their computational and data requirements. Some applications generate data volumes reaching hundreds of terabytes and even petabytes. As scientific applications become more data intensive, the management of data resources and dataflow between the storage and compute resources is becoming the main bottleneck. Analyzing, visualizing, and disseminating these large data sets has become a major challenge and data intensive computing is now considered as the “fourth paradigm” in scientific discovery after theoretical, experimental, and computational science.

DataCloud2011 will provide the scientific community a dedicated forum for discussing new research, development, and deployment efforts in running data-intensive computing workloads on Cloud Computing infrastructures. The DataCloud2011 workshop will focus on the use of cloud-based technologies to meet the new data intensive scientific challenges that are not well served by the current supercomputers, grids or compute-intensive clouds. We believe the workshop will be an excellent place to help the community define the current state, determine future goals, and present architectures and services for future clouds supporting data intensive computing.

Topics of interest include, but are not limited to:

- Data-intensive cloud computing applications, characteristics, challenges
- Case studies of data intensive computing in the clouds
- Performance evaluation of data clouds, data grids, and data centers
- Energy-efficient data cloud design and management
- Data placement, scheduling, and interoperability in the clouds
- Accountability, QoS, and SLAs
- Data privacy and protection in a public cloud environment
- Distributed file systems for clouds
- Data streaming and parallelization
- New programming models for data-intensive cloud computing
- Scalability issues in clouds
- Social computing and massively social gaming
- 3D Internet and implications
- Future research challenges in data-intensive cloud computing

IMPORTANT DATES:
Abstract submission: December 1, 2010
Paper submission: December 8, 2010
Acceptance notification: January 7, 2011
Final papers due: February 1, 2011

PAPER SUBMISSION:
DataCloud2011 invites authors to submit original and unpublished technical papers. All submissions will be peer-reviewed and judged on correctness, originality, technical strength, significance, quality of presentation, and relevance to the workshop topics of interest. Submitted papers may not have appeared in or be under consideration for another workshop, conference or a journal, nor may they be under review or submitted to another forum during the DataCloud2011 review process. Submitted papers may not exceed 10 single-spaced double-column pages using 10-point size font on 8.5x11 inch pages (IEEE conference style), including figures, tables, and references. DataCloud2011 also requires submission of a one-age (~250 words) abstract one week before the paper submission deadline.

WORKSHOP and PROGRAM CHAIRS:
Tevfik Kosar, Louisiana State University
Ioan Raicu, Illinois Institute of Technology

STEERING COMMITTEE:
Ian Foster, Univ of Chicago & Argonne National Lab
Geoffrey Fox, Indiana University
James Hamilton, Amazon Web Services
Manish Parashar, Rutgers University & NSF
Dan Reed, Microsoft Research
Rich Wolski, University of California, Santa Barbara
Liang-Jie Zhang, IBM Research

PROGRAM COMMITTEE:
David Abramson, Monash University, Australia
Roger Barga, Microsoft Research
John Bent, Los Alamos National Laboratory
Umit Catalyurek, Ohio State University
Abhishek Chandra, University of Minnesota
Rong N. Chang, IBM Research
Alok Choudhary, Northwestern University
Brian Cooper, Google
Ewa Deelman, University of Southern California
Murat Demirbas, University at Buffalo
Adriana Iamnitchi, University of South Florida
Maria Indrawan, Monash University, Australia
Alexandru Iosup, Delft University of Technology, Netherlands
Peter Kacsuk, Hungarian Academy of Sciences, Hungary
Dan Katz, University of Chicago
Steven Ko, University at Buffalo
Gregor von Laszewski, Rochester Institute of Technology
Erwin Laure, CERN, Switzerland
Ignacio Llorente, Universidad Complutense de Madrid, Spain
Reagan Moore, University of North Carolina
Lavanya Ramakrishnan, Lawrence Berkeley National Laboratory
Ian Taylor, Cardiff University, UK
Douglas Thain, University of Notre Dame
Bernard Traversat, Oracle
Yong Zhao, Univ of Electronic Science & Tech of China

 

 

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

 

Tuesday, September 14, 2010 12:45:05 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Ramblings

I’m dragging myself off the floor as I write this having watched this short video: MongoDB is Web Scale.

 

It won’t improve your datacenter PUE, your servers won’t get more efficient, and it won’t help you scale your databases but, still, you just have to check out that video.

 

Thanks to Andrew Certain of Amazon for sending it my way.

 

                                                                --jrh

 

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

 

Tuesday, September 14, 2010 5:55:41 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2] - Trackback
Ramblings

Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.

Archive
<February 2012>
SunMonTueWedThuFriSat
2930311234
567891011
12131415161718
19202122232425
26272829123
45678910

Categories
This Blog
Member Login
All Content © 2012, James Hamilton
Theme created by Christoph De Baene / Modified 2007.10.28 by James Hamilton