HPTS has always been one of my favorite workshops over the years. Margo Seltzer was the program chair this year and she and the program committee brought together one of the best programs ever. Earlier I posted my notes from Andy Bectolsheim’s session Andy Bechtolsheim at HPTS 2009 and his slides Technologies for Data Intensive Computing.
Two other sessions were particularly interesting and worth summarizing here. The first is a great talk on high-scale services lessons learned from Randy Shoup and a talk by John Ousterhout on RAMCloud a research project to completely eliminate the storage hierarchy and store everything in DRAM.
My notes from Randy’s talk follow and his slides are at: eBay’s Challenges and Lessons from Growing an eCommerce Platform to Planet Scale.
· eBay Manages
1. Over 89 million active users worldwide
2. 190 million items for sale in 50,000 categories
3. Over 8 billion URL requests per day
4. Roughly 10% of the items are listed or ended each day
5. 70B read/write operations/day
· Architectural Lessons
1. Partition Everything
2. Asynchrony Everywhere
3. Automate Everything
4. Remember Everything Fails
5. Embrace Inconsistency
6. Expect Service Evolution
7. Dependencies Matter
8. Know which databases are Authoritativeand which are caches
9. Never enough data (save everything)
10. Invest in custom infrastructure
My notes from John’s talk follow and his slides are at: RAMCloud: Scalable Data Center Storage Entirely in DRAM. I really enjoyed this talk despite the fact that I saw the same talk presented at the Stanford Clean Slate CTO Summit. This talk is sufficiently thought provoking to be just as interesting the second time through. My notes from John’s talk:
· Storage entirely in DRAM spread over 10s to 10s of thousands of servers
· Focus of project:
o Low latency and very large scale
· ~64GB server each supporting:
o 1M ops/second
o 5 to 10 us RPC
· Today commodity servers can stretch easily to 64GB. Expect to see 1TB in commodity servers out 5 to 10 years
· Current cost is roughly $60/GB. Expect this to fall to $4/GB in 5 to 10 years
· Motivation for RAMCloud project:
o Databases don’t scale
· Disk access rates not keeping up with capacity so disks must become archival:
o See Jim Gray’s excellent Disk is Tape
· Aggressive goal of achieving 5 to 10 u sec RPC
· Points out that very low latency applications are not built upon relational databases and argues that very low data access latency removes the need for optimization of access plans and concludes the relational model will disappear.
o I see value in low latency but don’t agree that the relational model will disappear. See One Size Does Not Fit All.
· John makes an interesting observation “the cost of consistency increases with transaction over-lap”
§ 0 = # overlapping transactions
§ R = arrival rate for new transactions
§ D = duration of each transaction
§ 0 is proportional to R * D
§ R increases with system scale and, eventually, strong consistency becomes unaffordable
§ But, D decreases with lower latency
o The interesting question: can we afford higher levels of consistency with lower latency?
· John argues perhaps with very low latency, one size might fit all (a single data storage system could handle all workloads).
o The counter argument to this one is that capital cost and power cost of an all memory solution appears prohibitively expensive for cold sequential workloads. Its perfect for OLTP but I don’t yet see the “one size can fit again” prediction.
If you are interested in digging deeper, the slides for all sessions are posted at: http://www.hpts.ws/agenda.html.