James Hamilton's Blog RSS 2.0
 Wednesday, April 09, 2008

What’s commonly referred to as the Great Firewall of China isn’t really a firewall at all.  I recently came across an Atlantic Monthly article investigating how the Great Firewall works and what it does (see The Connection has been Reset).

 

The official name of what is often called the Great Firewall of China is the Golden Shield project. Rather than acting as a firewall, it’s actually mirroring content and manipulating DNS, connection management, and URL redirection to implement its goal of restricting what internet users inside China can access.

 

This project has been widely criticized on political and social fronts – I won’t repeat them here.  It’s also been widely criticized on technical grounds as ineffective, weak, and easy to thwart.  Again, not my focus.  This article simply caught my interest technically as content filtering at this scale is an incredibly difficult task. What techniques are employed?

 

Like many software security problems, no single solution solves the problem fully and the main goal of the Golden Shield project is to add friction.  If it’s painful enough to get to the content they are trying to prevent from being accessed, few will bother to access it.  Essentially the goal of the four levels of protection they are using is to add friction and it’s friction rather than prevention that ensures that few Chinese internet users see restricted content in any quantity.  The four levels of protection/restriction are:

 

1.       DNS Block: sites that are on the current blacklist get DNS resolution failure or get redirected to other content.  This was the technique employed against google.cn to force them add filtering to their web index. For some time , all access to google.cn was redirected to their larger Chinese competitor baidu.  The other application of this technique is to return DNS lookup failure so, for example, searches for http://www.illegalsite.com will return “not found”.

2.       Connect: In parallel with connection requests leaving China, they are inspected.  If the IP address is on the current IP blacklist, connection reset will be sent which will cause the connection to fail.

3.       URL Block: If the URL contains words on the illegal word blacklist, the connection is redirected infinitely.  I’m not sure if they are only sniffing the URL or also doing reverse DNS to get the site name as well but, if unacceptable words are found in the URL, they redirect the connection repeated. Some browsers hang while others return an error message.

4.       Content Block: At this level the DNS lookup has been successful and the connection has been made and content is being returned to the user. As the content is returned to the requesting user inside China, it’s being scanned in parallel for unapproved keywords and phrases. If any are found, the connection is broken immediately. As well as breaking the connection mid-way, subsequent requests from that client IP to that destination IP are blocked. The first block is short, but consecutive attempts drive up the length of the IP-to-IP connect block period and may eventually draw official scrutiny.

 

In addition to these techniques to block access to content outside-of-China, an estimate 30,000 censors scan and get removed unapproved content posted within within China (see http://en.wikipedia.org/wiki/Internet_censorship_in_the_People%27s_Republic_of_China).

 

The Golden Shield project is reportedly also being used in the opposite direction to prevent access to some content inside of China from outside the country.

 

There are many means of subverting the Golden Shield including using a proxy server outside of China or setting up a VPN connection to a server outside of the country.  Encrypted connections will also get through as well encrypted email.  However, all these techniques are non-default and require some work on behalf of the user.  Most users don’t bother so, for the most part, the goals of the Golden Shield are attained even though it’s technically not that strong.

 

The Atlantic Monthly article: http://www.theatlantic.com/doc/200803/chinese-firewall?reddit

Wired Article: http://www.wired.com/politics/security/magazine/15-11/ff_chinafirewall

Wikipedia article: http://en.wikipedia.org/wiki/Internet_censorship_in_the_People%27s_Republic_of_China

 

                                                                --jrh

 

Thanks to Jennifer Hamilton and Mitch Wyle for pointing out the Atlantic Monthly article.

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Tuesday, April 08, 2008 11:13:57 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Ramblings
 Saturday, April 05, 2008

The services world is one built upon economies of scale.  For example, networking costs for small and medium sized services can run nearly an order of magnitude more than large bandwidth consumers such as Google, Amazon, Microsoft and Yahoo pay. These economies of scale make it possible for services such as Amazon S3 to pass on some of the economies of scale they get on networking, for example, to those writing against their service platform while at the same profiting (S3 is currently pricing storage under their cost but that’s a business decision rather than a business model problem). These economies of scale enjoyed by large service providers extend beyond networking to server purchases, power costs, networking equipment, etc.

 

Ironically, even with these large economies of scale, it’s cheaper to compute at home than in the cloud. Let’s look at the details.

 

Infrastructure costs are incredibly high in the services world with a new 13.5 mega-watt data center costing over $200m before the upwards of 50,000 servers that fill the data center are purchased.  Data centers are about the furthest thing from commodity parts and I have been arguing that we should be moving to modular data centers  for years (there has been progress on that front as well: First Containerized Data Center Announcement).  Modular designs take some of the power and mechanical system design from an upfront investment with 15 year life to a design that comes with each module and is on a three year or less amortization cycle and this helps increase the speed of innovation. 

 

Modular data centers help but they still require central power, mechanical systems, and networking systems and these systems remain expensive, non-commodity components. How to move the entire datacenter to commodity components?  Ken Church (http://research.microsoft.com/users/church/) makes a radical suggestion: rather than design and develop massive data centers with 15 year lives, let’s incrementally purchase condominiums (just-in-time) and place a small number of systems in each.  Radical to be sure but condo’s are a commodity and, if this mechanism really was cheaper, it would be a wake-up call to all of us to start looking much more closely at current industry-wide costs and what’s driving them. That’s our point here.

 

Ken and I did a quick back of envelope of this approach below.   Both configurations are designed for 54k servers and roughly 13.5MWs.  Condos appear notably cheaper, particularly in terms of capital.   

 

 

 

Large Tier II+ Data Center

Condo Farm (1125 Condos)

Specs

Servers

54k

54k (= 48 servers/condo * 1125 Condos)

 

 

 

Power (peak)

13.5 MW (= 250 Watts/server * 54k servers)

13.5MW (= 250 Watts/server * 54k servers  = 12 KW/condo * 1125 Condos)

 

 

 

 

Capital

Building

over $200M

$112.5M (= $100k/condo * 1125 Condos)

 

 

 

 

Annual Expense

Power

$3.5M/year (= $0.03 per kw/h * 24*356 hours/year * 13.5MW)

$10.6M/year (= $0.09 per kw/h * 24*365 hours/year * 13.5MW)

 

 

 

 

Annual Income

Rental Income

None

$8.1M/year (= $1000/condo per month * 12 months/year * 1125 Condos less $200/condo per month condo fees. We conservatively assume 80% occupancy)

 

 

In the quick calculation above, we have the condos at $100k each and all 1,1125 of them at $112.5M whereas the purpose built data center would price in at over $200M.  We have assumed an unusually low cost for power on the purpose built center with a 66% reduction over standard power rates. Deals this good are getting harder to negotiate but they still do exist.  The condo must pay full residential power costs without discount which is far higher at $10.6M/year.  However, offsetting this increased power cost, we rent the condo’s out at a low cost of $1,000/month and conservatively only account for 80% occupancy.

 

Looking at the totals, the condo’s are at 56% of the capital cost and annually they run $2.5M in operational costs whereas the data center power costs are higher at $3.5m.  The condos operational costs are 71% of the purpose built design.  Summarizing, the condo’s run just about ½ the cost of the purpose built data center both in capital and in annual operating costs.

 

Condos offer the option to buy/sell just-in-time.  The power bill depends more on average usage than worst-case peak forecast.  These options are valuable under a number of not-implausible scenarios:

·         Long-Term demand is far from flat and certain; demand will probably increase, but anything could happen over the next 15 years

·         Short-Term demand is far from flat and certain; power usage depends on many factors including time of day, day of week, seasonality, economic booms and busts.  In all data centers we’ve looked at average power consumption is well below worst-case peak forecast.

 

How could condos compete or even approach the cost of a purpose built facility built where land is cheap and power is cheaper?  One factor is that condos are built in large numbers and are effectively “commodity parts”.  Another factor is that most data centers are over-engineered.  They include redundancy such as uninterruptable power supplies that the condo solution doesn’t include.  The condo solution gets it’s redundancy via many micro-data centers and being able to endure failures across the fabric. When some of the non-redundantly powered micro-centers are down, the others carry the load. (Clearly achieving this application-level redundancy requires additional application investment).

 

One particularly interesting factor is when you buy large quantities of power for a data center, it is delivered by the utility in high voltage form. These high voltage sources (usually in the 10 to 20 thousand volt range) need to be stepped down to lower working voltages which brings efficiency losses, distributed throughout the data center which again brings energy losses, and eventually delivered to the critical load at the working voltage (240VAC is common in North America with some devices using 120VAC). The power distribution system represents approximately 40% of total cost of the data center. Included in that number are the backup generators, step-down transformers, power distribution units, and uninterruptable power supplies. Ignore the UPS and generators since we’re comparing non-redundant power, and two interesting factors jump out: 1) the cost of the power distribution system ignoring power redundancy is 10 to 20% of the cost of the data center and 2) the power losses through distribution run 10 to 12% of the power brought into the center.

 

This is somewhat ironic in that a single family dwelling gets two-phase 120VAC (240VAC between the phases or 120VAC between either phase and ground) delivered directly to the home.  All the power losses experienced through step down transformers (usually in the 92 to 96% efficiency range) and all the power lost through distribution (depends upon size and length of conductor) is paid for by the power company. But, if you buy huge quantities of power as we do in large data centers, the power company delivers high voltage lines to the property and you need to pay the substantial capital cost of step down transformers and, in addition, pay for the power distribution losses.  Ironically, if you don’t buy much power, the infrastructure is free. If you buy huge amounts, you need to pay for the infrastructure.  In the case of condos, the owners need to pay for the inside the building distribution so they are somewhere between single family dwellings and data centers in having to pay for part of the infrastructure but not as much as a DC.

 

Perhaps, the power companies have found a way to segment the market into consumer v. business.  Businesses pay more because they are willing to pay more.  Just as businesses pay more for telephone service and airplane travel, businesses also pay more for power.  Despite great deals we’ve been reading about, data centers are actually paying more for power than consumers after factoring in the capital costs.    Thus, it is a mistake to move computation from the home to the cloud because doing so moves the cost structure from consumer rates to business rates.

 

The condo solution might be pushing the limit a bit but whenever we see a crazy idea even within a factor of two of what we are doing today, something is wrong.  Let’s go pick some low hanging fruit.

 

Ken Church & James Hamilton

{Church, JamesRH} @microsoft.com

 

Saturday, April 05, 2008 11:15:44 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2] - Trackback
Services
 Thursday, April 03, 2008

A couple of interesting directions brought together: 1) Oracle compatible DB startup, and 2) a cloud-based implementation.

 

The Oracle compatible offering is EnterpriseDB. They use the PostgreSQL code base and implement Oracle compatibility to make it easy for the huge Oracle install base to support them.  An interesting approach.  I used to lead the SQL Server Migration Assistant team so I know that true Oracle compatibility is tough but, even failing to be 100% compatible makes it easier for Oracle apps to port over to them. The pricing model is free for a developer license and $6k/socket for their Advanced Server edition.

 

The second interesting direction is offering is from Elastra.  It’s a management and administration system that automates deploying and managing dynamically scalable services. As part of the Elastra offering is support for Amazon AWS EC2 deployments.

 

Bring together EnterpriseDB and Elastra and you have an Oracle compatible database, hosted in EC2, with deployment and management support: ELASTRA Propels EnterpriseDB into the Cloud. I couldn’t find any customer usage examples so this may be more press release than a fully exercised, ready for prime-time solution but it’s a cool general direction and I expect to see more offerings along these lines over next months.  Good to see.

 

                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Thursday, April 03, 2008 11:17:15 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Wednesday, April 02, 2008

I’m a big believer in auto-installable client software but I also want a quality user experience.  For data intensive applications, I want a caching client. I use and love many of browser-hosted clients but, for development work, email clients, and photo editing, I still use installable software. I want a snappy user experience, I need to be able to run disconnected or weakly connected, and I want to fully use my local resources.  Speed and richness is king for these apps – it’s the casual apps that are getting replaced well by browser based software in my world. 

 

However, I’ve been blown away but how fast the set of applications I’m willing to run in the browser has been expanding. For example, Yahoo Mail impressed me when it came out. Both Google and Live maps are impressive (how can anyone understand and maintain that much JavaScript?).  In fact, in the ultimate compliment, these mapping services are good enough that, even though I have local mapping software installed, I seldom bother to start it.  

 

Here’s another one that announced last week that is truly impressive: https://www.photoshop.com/express/landing.html.  The Adobe online implementation of Photoshop is an eye opener. Predictably, it’s flash and flex based and, wow, it’s amazing for a within-the-browser experience.  I’m personally still editing my pictures locally but Photoshop Express shows a bit of what’s possible.

 

                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Wednesday, April 02, 2008 11:18:16 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services | Software
 Tuesday, April 01, 2008

Microsoft has been investigating and testing containers and modular data centers for some time now.  I wrote about them some time back in Architecture for Modular Data Centers (presentation) at the 2007 Conference on Innovative Data Research. Around that time Rackable Systems and Sun Microsystems announced shipping container based solutions and Rackable shipped the first production container.  That first unit had more than 1,000 servers.  Rackable and Sun helped get this started as early on most of the industry was somewhere between skeptical and actively resistant.

 

Over the last couple of years, the modular datacenter approach has gained momentum.  Now nearly all data center equipment providers have started offering container based solutions

·         IBM Scalable modular data center

·         Rackable ICE Cube™ Modular Data Center

·         Sun Modular Datacenter S20 (project Blackbox)

·         Dell Insight

·         Verari Forest Container Solution

 

It’s great to see all the major systems providers investing modular data centers. I expect the pace of innovation to pick up and over the last two weeks I’ve seen three new designs.  Things are moving.

 

Yesterday Mike Manos who leads the Microsoft Global Foundations Data Center team made the first public announcement of a containerized production data center at Data Center World. The Microsoft Chicago facility is a two floor design where the first floor is a containerized design housing 150 to 220 40’ containers each 1,000 to 2,000 servers.   Chicago is a large facility with the low end of the ranges Mike quoted yielding 150k serves and the high end running to 440k servers.  If you assume 200W/server, the critical load would run between 30MW and 88MW for the half of the data center that is containerized.  If you conservatively assume a PUE of 1.5, we can estimate the containerized portion of the data center at between 45MW and 132MW total load.  It’s a substantial facility.

 

John Rath posted great notes on Mike’s entire talk: http://datacenterlinks.blogspot.com/2008/04/miichael-manos-keynote-at-data-center.html.  And, I’m excited about this new news now being public, so when Mike gets back into the office at Redmond I’ll pester him to see if he can release the slides he used.  If so, I’ll post them here.

 

Thanks to Rackable Systems and Sun Microsystems for getting the industry started on commodity-based containerized designs.  We now have modular components from most major server vendors and Mike’s talk yesterday at Data Center World market the first publically announced modular facility.

 

                                    --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh 

Tuesday, April 01, 2008 11:19:54 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3] - Trackback
Services
 Monday, March 31, 2008

Tom Kleinpeter was one of the founders of Foldershare (acquired by Microsoft in 2006) and before that he was a part of the original team at Audiogalaxy. I worked with Tom while he was at Microsoft working on Mesh. Tom recently decided to take some time off, to relax, be a father, and it looks like he’s also finding time to put write up some of his experiences. I particularly like the Audiogalaxy Chronicles where he writes up his experiences with Audiogalaxy which grew like only successful startup can shooting to 80 million page views a day from 35 million unique users.

 

I found this post particularly interesting where Tom describes scaling the Audiogalaxy design and some of the challenges they had in scaling to 80 million page views a day: http://www.spiteful.com/2008/02/27/scaling-audiogalaxy-to-80-million-daily-page-views/.

 

Read them all: http://www.spiteful.com/the-audiogalaxy-chronicles/.

 

                                    --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Monday, March 31, 2008 11:21:05 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Sunday, March 30, 2008

There is (again) a rumor out there that Google will soon offer a third party service platform: http://www.scripting.com/stories/2008/03/29/pigs.html.  I mostly ignore the rumors but this is one I find hard to ignore. Why?  Mostly because it makes too much sense.  The Google infrastructure investment combined with phenomenal scale yields some of the lowest cost compute and storage in the industry.  They can sell compute and storage at considerably above their costs and yet still be offering substantial cost reductions to smaller services.  That’s if they chose to charge for it.  Google also has the highest scale advertising platform in the world offering opportunity to monetize even that for which they don’t directly charge.  When something looks like it makes sense economically and fits in strategically, it just about has to happen.

 

We all know that these rumors often have nothing at all behind them.  Some are simply excited fabrications. But, even knowing that, on this one it’s a matter of when rather than if.

 

Thanks to Dare Obasanjo for pointing me to the blog posting above.

 

                                    --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh

Sunday, March 30, 2008 9:23:04 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Services
 Thursday, March 27, 2008

Yahoo! hosted the Hadoop Summit Tuesday of this week.  I posted my rough notes on the conference over the course of the day – posting summarized some of what caught my interest and consolidates my notes.

 

Yahoo expected 100 attendees and ended up having to change venues to get closer to fitting the more than 400 who wanted to attend.  For me the most striking thing is that Hadoop is now clearly in broad use and at scale. Dave Cutting did a quick survey at the start and rough ½ the crowd were running Hadoop in production and around 1/5 have over 100 node clusters. Yahoo remains the biggest with 2,000 nodes in their cluster.

 

Christian Kunz of Yahoo! gave a bit of a window into how Yahoo! is using Hadoop to process their Webmap data store. The Webmap is a structured storage representation of all Yahoo! crawled pages and all the metadata they extract or compute on those pages.  There are over 100 Webmap applications used in managing the Yahoo! indexing engine. Christian talked about why they moved to Hadoop from the legacy system and summarized the magnitude of the workload they are running. These are almost certainly the largest Hadoop jobs in the world. The longest map/reduce jobs run for over three days and have 100k maps and 10k reduces. This job reads 300 TB and produces 200 TB.

 

Another informative talk was given by the Facebook team. They described Hive, the data warehouse at Facebook.  Joydeep Sarma and Ashish Thusoo presented this work. I liked this talk as it was 100% customer driven. They implemented what the analyst and programmers inside Facebook needed and I found their observations credible and interesting.  They reported that Analyst are used to SQL and found a SQL like language most productive but that programmers like to have direct access to map/reduce primitives.  As a consequence, they provide both (so do we).  The Facebook team reports they roughly 25% of the development team using Hive and process 3,500 map/reduce jobs a week.

 

Google is heavily invested in Hadoop using it as a teaching vehicle even though it’s not used internally.  The Google interest in Haddop is to get graduating students more familiar with the map/reduce programming model. Several schools have agreed to teach the map/reduce programming using Hadoop. For example Berkeley, CMU, MIT, Stanford, UW, and UMD all plan courses

 

The agenda for the day:

Time

Topic

Speaker(s)

8:00-8:55

Breakfast/Registration

8:55-9:00

Welcome & Logistics

Ajay Anand, Yahoo!

9:00-9:30

Hadoop Overview

Doug Cutting / Eric Baldeschwieler, Yahoo!

9:30-10:00

Pig

Chris Olston, Yahoo!

10:00-10:30

JAQL

Kevin Beyer, IBM

10:30-10:45

Break

10:45-11:15

DryadLINQ

Michael Isard, Microsoft

11:15-11:45

Monitoring Hadoop using X-Trace

Andy Konwinski and Matei Zaharia, UC Berkeley

11:45-12:15

Zookeeper

Ben Reed, Yahoo!

12:15-1:15

Lunch

1:15-1:45

Hbase

Michael Stack, Powerset

1:45-2:15

Hbase at Rapleaf

Bryan Duxbury, Rapleaf

2:15-2:45

Hive

Joydeep Sen Sarma / Ashish Thusoo, Facebook

2:45-3:05

GrepTheWeb - Hadoop an AWS

Jinesh Varia, Amazon.com

3:05-3:20

Break

3:20-3:24

Building Ground Models of Southern California

Steve Schlosser, David O'Hallaron, Intel / CMU

3:40-4:00

Online search for engineering design content

Mike Haley, Autodesk

4:00-4:20

Yahoo - Webmap

Arnab Bhattacharjee, Yahoo!

4:20-4:45

Natural language Processing

Jimmy Lin, U of Maryland / Christophe Bisciglia, Google