I did a talk at the Usenix Tech conference last year, Where does the Power Go in High Scale Data Centers. After the talk I got into a more detailed discussion with many folks from Netflix and Canada’s Research in Motion, the maker of the Blackberry. The discussion ended up in a long lunch over a big table with folks from both teams. The common theme of the discussion was predictably, given the companies and folks involved, innovation in high scale service and how to deal with incredible growth rates. Both RIM and Netflix are very successful and, until you have experienced and attempted to manage internet growth rates, you really just don’t know. I'm impressed with what they are doing. Growth brings super interesting problems and I learned from both and really enjoyed spending time with them.
I recently came across an interesting talk by Santosh Rau, the Netflix Cloud Infrastructure Engineering Manager. The fact that Netflix actually has a Cloud Infrastructure engineering manager is what caught my attention. Netflix continues to innovate quick and is moving fast with cloud computing.
My notes from Rau’s talk:
· Details on Netflix
o More than 10m subscribers
o Over 100,000 DVD titles
o 50 distribution centers
o Over 12,000 instant watch titles
· Why is Netflix going to the cloud
o Elastic infrastructure
o Pay for what you use
o Simple to deploy and maintain
o Leverage datacenter geo-diversity
o Leverage application services (queuing, persistence, security, etc.
· Why did Netflix chose Amazon Web Services
o Massive scale
o More mature services
o Thriving, active developer community of over 400,000 developers with excellent support
· Netflix goals for move to the cloud:
o Improved availability
o Operational simplicity
o Architect to exploit the characteristic of the cloud
· Services in cloud:
o Streaming control service: stream movie content to customers
§ Architecture: Three Netflix services running in EC2 (replication, queueing, and streaming) with inter-service communication via SQS and persistent state in SimpleDB.
§ Good cloud workload in that usage can vary greatly and there is value in having regional data centers and a better customer experience is possible by streaming content from locations near users
o Encoding Service: Encodes movies in format required by diverse set of supported devices.
§ Good cloud workload in that its very computational intense and as new formats are introduced, massive encoding work needs to be done and there is value in doing it quickly (more servers for less time).
o AWS Services used by Netflix
§ Elastic compute Cloud
§ Elastic Block Storage
§ Simple Queuing Service
§ SimpleDB
§ Simple Storage Service
§ Elastic Load Balancing
§ Elastic MapReduce
o Developer Challenges:
§ Reliability and capacity
§ Persistence strategy
· Oracle on EC2 over EBS vs MySQL vs SimpleDB
· SimpleDB: Highly available replicating across zones
· Eventually consistent (now supports full consistency (I love eventual consistency but…)
§ Data encryption and key management
§ Data replication and consistency
Predictably, the talk ended with “Netflix is hiring” but, in this case, it is actually worth mentioning. They are doing very interesting work and moving lightening fast. RIM is hiring as well: http://www.rim.com/careers/index.shtml.
The slides for the talk are at: slideshare.
--jrh
James Hamilton
e: jrh@mvdirona.com
w: http://www.mvdirona.com
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.