Earlier this week Clustrix
announced a MySQL compatible, scalable database appliance that caught my interest. Key features supported by Clustrix:
· MySQL protocol emulation (MySQL protocol supported so MySQL apps written to the MySQL client libraries just work)
· Hardware appliance delivery package in a 1U package including both NVRAM and disk
· Infiniband interconnect
· Shared nothing, distributed database
· Online operations including alter table add column
I like the idea of adopting a MySQL programming model. But, it’s incredibly hard to be really MySQL compatible unless each node is actually based upon the MySQL execution engine. And it’s usually the case that a shared nothing, clustered DB will bring some programming model constraints. For example, if global secondary indexes aren’t implemented, it’s hard to support uniqueness constraints on non-partition key columns and it’s hard to enforce referential integrity. Global secondary indexes maintenance implies a single insert, update, or delete that would normally only require a single node change would require atomic updates across many nodes in the cluster making updates more expensive and susceptible to more failure modes. Essentially, making a cluster look exactly the same as a single very large machine with all the same characteristics isn’t possible. But, many jobs that can’t be done perfectly are still well worth doing. If Clustrix delivers all they are describing, it should be successful.
I also like the idea of delivering the product as a hardware appliance. It keep the support model simple, reduces install and initial setup complexity, and enables application specific hardware optimizations.
Using Infiniband as a cluster interconnect is a nice choice as well. I believe that 10GigE with RDMA support will provide better price performance than Infiniband but commodity 10GigE volumes and quality RDMA support is still 18 to 24 months away so Inifiband is a good choice for today.
Going with a shared nothing architecture avoids dependence on expensive shared storage area networks and the scaling bottleneck of distributed lock managers. Each node in the cluster is an independent database engine with its own physical (local) metadata, storage engine, lock manager, buffer manager, etc. Each node has full control of the table partitions that reside on that node. Any access to those partitions must go through that node. Essentially, bringing the query to the data rather than the data to the query. This is almost always the right answer and it scales beautifully.

In operation, a client connects to one of the nodes in the cluster and submits a SQL statement. The statement is parsed and compiled. During compilation, the cluster-wide (logical) metadata is accessed as needed and an execution plan is produced. The cluster-wide (logical) metadata is either replicated to all nodes or stored centrally with local caching. The execution plan produced by the query compilation will be run on as many nodes as needed with the constraint that table or index access be on the nodes that house those table or index partitions. Operators higher in the execution plan can run on any node in the cluster. Rows flow between operators that span node boundaries over the infiniband network. The root of the query plan runs on the node where the query was started and the results are returned to client program using the MySQL client protocol
As described, this is a very big engineering project. I’ve worked on teams that have taken exactly this approach and they took several years to get to the first release and even subsequent releases had programming model constraints. I don’t know how far along Clustrix is a this point but I like the approach and I’m looking forward to learning more about their offering.
White paper: Clustrix: A New Approach
Press Release: Clustrix Emerges from Stealth Mode with Industry’s First Clustered DB
James Hamilton
e: jrh@mvdirona.com
w: http://www.mvdirona.com
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
In this month’s Communications of the Association of Computing Machinery, a rematch of the MapReduce debate was staged. In the original debate, Dave Dewitt and Michael Stonebraker, both giants of the database community, complained that:
1. MapReduce is a step backwards in database access
2. MapReduce is a poor implementation
3. MapReduce is not novel
4. MapReduce is missing features
5. MapReduce is incompatible with the DBMS tools
Unfortunately, the original article appear to be no longer available but you will find the debate branching out from that original article by searching on the title Map Reduce: A Major Step Backwards. The debate was huge, occasionally entertaining, but not always factual. My contribution was MapReduce a Minor Step forward.
Update: In comments, csliu offered updated URLs for the original blog post and a follow-on article:
· MapReduce: A Major Step Backwards
· MapReduce II
I like MapReduce for a variety of reasons the most significant of which is that it allows non-systems programmers to write very high-scale, parallel programs with comparative ease. There have been many attempts to allow mortals to write parallel programs but there really have only been two widely adopted solutions that allow modestly skilled programmers to write highly concurrent executions: SQL and MapReduce. Ironically the two communities participating in the debate, Systems and Database, have each produced a great success by this measure.
More than 15 years ago, back when I worked on IBM DB2, we had DB2 Parallel Edition running well over a 512 server cluster. Even back then you could write a SQL Statement that would run over a ½ thousand servers. Similarly, programmers without special skills can run MapReduce programs that run over thousands of serves. The last I checked Yahoo, was running MapReduce jobs over a 4,000 node cluster: Scaling Hadoop to 4,000 nodes at Yahoo!.
The update on the MapReduce debate is worth reading but, unfortunately, the ACM has marked the first article as “premium content” so you can only read it if you are a CACM subscriber:
· MapReduce and Parallel DBMSs: Friend or Foe
· MapReduce: A Flexible Data Processing Tool
Update: Moshe Vardi, Editor in Chief of the Communications of the Association of Computing Machinery has kindly decided to make both the of the above articles freely available for all whether or not CACM member. Thank you Moshe.
Even more important to me than the MapReduce debate is seeing this sort of content made widely available. I hate seeing it classified as premium content restricted to members only. You really all should be members but, with the plunging cost of web publishing, why can’t the above content be made freely available? But, while complaining about the ACM publishing policies, I should hasten to point out that the CACM has returned to greatness. When I started in this industry, the CACM was an important read each month. Well, good news, the long boring hiatus is over. It’s now important reading again and has been for the last couple of years. I just wish the CACM would follow the lead of ACM Queue and make the content more broadly available outside of the membership community.
Returning to the MapReduce discussion, in the second CACM article above, MapReduce: A Flexible Data Processing Tool, Jeff Dean and Sanjay Ghemawat, do a thoughtful job of working through some of the recent criticism of MapReduce.
If you are interested in MapReduce, I recommend reading the original Operating Systems Design and Implementation MapReduce paper: MapReduce: Simplied Data Processing on Large Clusters and the detailed MapReduce vs database comparison paper: A Comparison of Approaches to Large-Scale Data Analysis.
--jrh
James Hamilton
e: jrh@mvdirona.com
w: http://www.mvdirona.com
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
HPTS has always been one of my favorite workshops over the years. Margo Seltzer was the program chair this year and she and the program committee brought together one of the best programs ever. Earlier I posted my notes from Andy Bectolsheim’s session Andy Bechtolsheim at HPTS 2009 and his slides Technologies for Data Intensive Computing.
Two other sessions were particularly interesting and worth summarizing here. The first is a great talk on high-scale services lessons learned from Randy Shoup and a talk by John Ousterhout on RAMCloud a research project to completely eliminate the storage hierarchy and store everything in DRAM.
Randy Shoup
My notes from Randy’s talk follow and his slides are at: eBay’s Challenges and Lessons from Growing an eCommerce Platform to Planet Scale.
· eBay Manages
1. Over 89 million active users worldwide
2. 190 million items for sale in 50,000 categories
3. Over 8 billion URL requests per day
4. Roughly 10% of the items are listed or ended each day
5. 70B read/write operations/day
· Architectural Lessons
1. Partition Everything
2. Asynchrony Everywhere
3. Automate Everything
4. Remember Everything Fails
5. Embrace Inconsistency
6. Expect Service Evolution
7. Dependencies Matter
8. Know which databases are Authoritativeand which are caches
9. Never enough data (save everything)
10. Invest in custom infrastructure
John Ousterhout
My notes from John’s talk follow and his slides are at: RAMCloud: Scalable Data Center Storage Entirely in DRAM. I really enjoyed this talk despite the fact that I saw the same talk presented at the Stanford Clean Slate CTO Summit. This talk is sufficiently thought provoking to be just as interesting the second time through. My notes from John’s talk:
· Storage entirely in DRAM spread over 10s to 10s of thousands of servers
· Focus of project:
o Low latency and very large scale
· ~64GB server each supporting:
o 1M ops/second
o 5 to 10 us RPC
· Today commodity servers can stretch easily to 64GB. Expect to see 1TB in commodity servers out 5 to 10 years
· Current cost is roughly $60/GB. Expect this to fall to $4/GB in 5 to 10 years
· Motivation for RAMCloud project:
o Databases don’t scale
· Disk access rates not keeping up with capacity so disks must become archival:
o See Jim Gray’s excellent Disk is Tape
· Aggressive goal of achieving 5 to 10 u sec RPC
· Points out that very low latency applications are not built upon relational databases and argues that very low data access latency removes the need for optimization of access plans and concludes the relational model will disappear.
o I see value in low latency but don’t agree that the relational model will disappear. See One Size Does Not Fit All.
· John makes an interesting observation “the cost of consistency increases with transaction over-lap”
o Let:
§ 0 = # overlapping transactions
§ R = arrival rate for new transactions
§ D = duration of each transaction
o Then:
§ 0 is proportional to R * D
§ R increases with system scale and, eventually, strong consistency becomes unaffordable
§ But, D decreases with lower latency
o The interesting question: can we afford higher levels of consistency with lower latency?
· John argues perhaps with very low latency, one size might fit all (a single data storage system could handle all workloads).
o The counter argument to this one is that capital cost and power cost of an all memory solution appears prohibitively expensive for cold sequential workloads. Its perfect for OLTP but I don’t yet see the “one size can fit again” prediction.
If you are interested in digging deeper, the slides for all sessions are posted at: http://www.hpts.ws/agenda.html.
--jrh
James Hamilton
e: jrh@mvdirona.com
w: http://www.mvdirona.com
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
Just about exactly one year ago, I posted a summary and the slides from an excellent Butler Lampson talk: The Uses of Computers: What’s Past is Merely Prologue. Its time for another installment. Butler was at SOSP 2009 a few weeks back and Marvin Theimer caught up with him for a wide ranging discussion on distributed systems.
With Butler's permission, what follows are Marvin’s notes from the discussion.
Contrast cars with airplanes: when the fancy electronic systems fail you (most-of-the-time) can pull a car over to the side of the road and safely get out whereas an airplane will crash-and-burn.
Systems that behave like cars vs. airplanes:
· It’s like a car if you can reboot it
· What’s the scope of the damage if you reboot it
Ensuring that critical sub-systems keep functioning:
· Layered approach with lower layers being simpler and able to cut off the higher layers and still keep functioning
· Bottom layers need to be simple enough to reason about or perhaps even formally verify
· Be skeptical about designing systems that gracefully degrade/approach their “melting points”. Nice in theory, but not likely to be feasible in practice in most cases.
· Have “firewalls” that partition your system into independent modules so that damage is contained.
· Firewalls have “blast doors” that automatically come down in case of alarms going off. Under normal circumstances the blast doors are up and you have efficient, optimized interaction between modules. When alarms go off the blast doors go down. The system must be able to work in degraded mode with the blast doors down.
· You need to continually test your system for how it behaves with the blast doors down to ensure that “critical functioning” is still achievable despite system evolution and environment evolution. Problem is that testing is expensive, so there is a trade-off between extensive testing and cost. Essentially you can’t test everything. This is part of the reason why the lowest levels of the system need to be simple enough to formally reason about their behavior.
o Dave Gifford’s story about bank that had diesel backup generators for when power utility failed. They religiously tested firing up the backup generators. However, when a prolonged power actually occurred they discovered that the generators failed after half an hour because their lubricating oil failed. No one had thought to test running on backup power for more than a few minutes.
Low-level multicast is bad because you can’t reason about the consequences of its use. Better to have application-level multicast where you can explicitly control what’s going on.
RPC conundrum:
· People have moved back from RPC to async messages because of the performance issues of sync RPC.
· By doing so they are reintroducing concurrency issues into their programs.
A possible solution:
· Constrain your system (if you can) to enable the definition of a small number of interaction patterns that hide the concurrency and asynchrony.
· Your experts employ async messages to implement those higher-level interaction patterns.
· The app developers only use the simper, higher-level abstractions.
· Be happy with 80% solution – which you might achieve – and don’t expect to be able to handle all interactions this way.
Partitioned, primary-key scale-out approach is essentially mimicking the OLTP model of transactional DBs. You are giving up certain kinds of join operators in exchange for scale-out and the app developer is essentially still programming the simple ACID DB model.
· Need appropriate patterns/framework for updating multiple objects in a non-transactional manner.
· Standard approach: update one object and transactionally write a message in a message queue for the other object. Transactional update to other object is done asynchronously to the first object update. Need compensation code for when things go wrong.
· An interesting approach for simplifying the compensation problem: try to turn it into a garbage collection problem. Background tasks look for to-do messages that haven’t been executed and figure out how to bring the system back into “compliance”. You need this code for your fsck case anyway.
WARNING: don’t over-engineer your system. Lots of interesting ideas here; you’ll be tempted to over-generalize and make things too complicated. “Ordinary designers get it wrong 99% of the time; really smart designers get it wrong 50% of the time.”
Thanks to Butler Lampson and Marvin Theimer for making this summary available.
--jrh
James Hamilton
e: jrh@mvdriona.com
w: http://www.mvdirona.com
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
Here’s another innovative application of commodity hardware and innovative software to the high-scale storage problem. MaxiScale focuses on 1) scalable storage, 2) distributed namespace, and 3) commodity hardware.
Today's announcement: http://www.maxiscale.com/news/newsrelease/092109.
They sell software designed to run on commodity servers with direct attached storage. They run N-way redundancy with a default of 3-way across storage servers to be able to survive disk and server failure. The storage can be accessed via HTTP or via Linux or Windows (2003 and XP) file system calls. The later approach requires a kernel installed device driver and uses a proprietary protocol to communicate back with the filer cluster but has the advantage of directly support local O/S read/write operations. MaxiScale architectural block diagram:

Overall I like the approach of using commodity systems with direct attached storage as the building block for very high scale storage clusters but that is hardly unique. Many companies have head down this path and, generally, it’s the right approach. What caught my interest when I spoke to the MaxiScale team last week was: 1) distributed metadata, 2) MapReduce support, and 3) small file support. Let’s look at each of these major features:
Distributed Metadata
File systems need to maintain a namespace. We need to maintain the directory hierarchy and we need to know where to find the storage blocks that make up the file. In addition, other attributes and security may need to be stored depending upon the implemented file system semantics. This metadata is often stored in large key/value store. The metadata requires at least some synchronization since, for example, you don’t want to create two different objects of the same name at roughly the same time. At high scale, storage servers will be joining and leaving the cluster all the time, so having a central metadata service is a easy approach to the problem. But, as easy as it is to implement a central metadata systems, they bring scaling limits. Eventually the metadata gets too hot and needs to be partitioned. In fairness, it’s amazing how far central metadata can be scaled but eventually hot spots develop and it needs to be partitioned. For example, Google GFS just went down this path: GFS: Evolution on Fast-forward. Partitioning metadata is a fairly well understood problem. What makes it a bit of a challenge is making the metadata system adaptive and able to re-partition when hot spots develop.
MaxiScale took an interesting approach to scaling the metadata. They distributed the metadata servers over the same servers that store the data rather than implement a cluster of dedicated metadata servers. They do this by hashing on the parent directory to find what they call a Peer Set and then, in that particular Peer Set, they look up the object name in the metadata store, find the file block location, and then apply the operation to the blocks in that same Peer Set.

Distributing the metadata over the same Peer Set as the stored data means that each peer set is independent and self-describing. The downside of having a fixed hash over the peer sets is that it’s difficult to cool down an excessively hot peer set by moving objects since the hash is known by all clients.
MapReduce Support
I love the MaxiScale approach to multi-server administration. They need to provide customers the ability to easily maintain multi-server clusters. They could have implemented a separate control plane to manage the all the servers that make up the cluster but, instead, they just use Hadoop and run MapReduce jobs.
All administrative operations are written as simple MapReduce jobs which certainly made the implementation task easier but it’s also a nice, extensible interface to allow customers to write custom administrative operations. And, since MapReduce is available over the cluster, its super easy to write data mining and data analysis jobs. Supporting MapReduce over the storage system is a nice extension of normal filer semantics.
Small File Support
The standard file system access model is to probe the metadata to get the block list and then access the blocks to perform the file operation. In the common case, this will require two I/Os which is fine for large files but the two I/Os can be expensive for small file access. And, since most filers use fixed size blocks and, for efficiency these block size tend to be larger than the average small file, some space is wasted. The common approach to this two problems is to pull small files “up” and rather than store the list of storage blocks in the file metadata, just store the small file. This works fine for small files and avoids both the block fragmentation and the multiple I/O problem on small files. This is what MaxiScale has done as well and they claim single I/O for any small file stored in the system.
More data on the MaxiScale filer: Small Files, Big Headaches: Ensuring Peak Performance
I love solutions based upon low-cost, commodity H/W and application maintained redundancy and what MaxiScale is doing has many of the features I would like to see in a remote filer.
--jrh
James Hamilton
e: jrh@mvdriona.com
w: http://www.mvdirona.com
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
AJAX applications are wonderful because they allow richer web applications with much of the data being brought down asynchronously. The rich and responsive user interfaces of applications like Google Maps and Google Docs are excellent but JavaScript developers need to walk a fine line. The more code they download, the richer the UI they can support and the less synchronous server interactions they need. But, the more code they download, the slower the application can be to start. This is particularly noticeable when the client cache is cold and in mobile applications with restricted bandwidth back to the server.
Years ago profile directed code reorganization (a sub-class of Basic Block Transforms) were implemented to solve what might appear to be an unrelated problem. The problem tackled by these profile directed basic block reorganizations is decreasing the number of last level cache misses in a server. They do this by organizing frequently accessed code segments together and moving rarely executed code segments. The biggest gain is that seldom executed error handling code can be moved away from frequently executed application code. I’ve seen reports of error handling code making up more than 40% of an application. Moving this code away from the commonly executed mainline code allows fewer processor cache lines to support program execution which demands fewer memory faults. Error handling code will execute more slowly but that is seldom an issue. Profile directed basic block transforms need to be trained on “typical” applications workloads and code that typically executes together will be placed together. Unfortunately, “typical” is often an important, industry standard benchmark like TPC-C so sometimes “typical” is replaced by “important” :-). Nonetheless, the tools are effective and greater than 20% improvement is common and we often see much more. All commercial database servers use or misuse profile directed basic block reorganizations.
The JavaScript download problem is actually very similar to the problem addressed by basic block transforms. Getting code from the server takes relatively long time just as getting code from memory takes a long time relative to executing code already in the processor cache. Much of the application doesn’t execute in the common case so it makes little sense to download it all unless needed in this execution. Most of the code isn’t needed to start the application so it’s a big win to download the code, start the application, and then download what is needed in the background.
Last week Ben Livshits and Emre Kiciman of the Microsoft Research team released an interesting tool that does exactly this for JavaScript applications. Doloto analyses client JavaScript systems and breaks them up into a series of independent modules. The primary module is downloaded first and includes just stubs for the other modules. This primary module is smaller, downloads faster, and dramatically improves time to live application. In the Doloto team measurements, the size of the initial download was only between 20% and 60% of the size of the standard download. In the case of Google docs, the initial download was less than 20% of the original size.

Once the initial module is downloaded, the application is live and running and the rest of the modules are brought down asynchronously or faulted in as needed. Many applications due these optimizations manually but this is a nice automated approach to the problem
I’ve seen 80,000 line JavaScript programs and there are many out there far larger. Getting the application running fast dramatically improves the user experience and this is a nice approach to achieving that goal. Doloto is available for download at: http://msdn.microsoft.com/en-us/devlabs/ee423534.aspx. And there is a more detailed Doloto paper at: http://research.microsoft.com/en-us/um/people/livshits/papers/pdf/fse08.pdf and summary information at: http://research.microsoft.com/en-us/projects/doloto/.
James Hamilton
e: jrh@mvdriona.com
w: http://www.mvdirona.com
b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
MapReduce has created some excitement in the relational database community. Dave Dewitt and Michael Stonebraker’s MapReduce: A Major Step Backwards is perhaps the best example. In that posting they argued that map reduce is a poor structured storage technology, the execution engine doesn’t include many of the advances found in modern, parallel RDBMS execution engines, it’s not novel, and its missing features.
In Mapreduce: A Minor Step Forward I argued that MapReduce is an execution model rather than storage engine. It is true that it is typically run over a file system like GFS or HDFS or simple structured storage system like BigTable or Hbase. But, it could be run over a full relational database.
Why would we want to run Hadoop over a full relational database? Hadoop scales: Hadoop has been scaled to 4,000 nodes at Yahoo! Scaling Hadoop to 4000 nodes at Yahoo!. Scaling a clustered RDBMS too 4k nodes is certainly possible but the high scale single system image cluster I’ve seen was 512 nodes (what was then called DB2 Parallel Edition). Getting to 4k is big. Hadoop is simple: automatic parallelism has been an industry goal for decades but progress has been limited. There really hasn’t been success in allowing programmers of average skill to write massively parallel programs except for SQL and Hadoop. Programmers of bounded skill can easily write SQL that will be run in parallel over high scale clusters. Hadoop is the only other example I know where this is possible and happening regularily.
Hadoop makes the application of 100s or even 1000s of nodes of commodity computers easy so why not Hadoop over full RDBMS nodes? Daniel Abadi and team from Yale and Brown have done exactly that. In this case, Hadoop over PostgresSQL. From Daniel’s blog:
HadoopDB is:
1. A hybrid of DBMS and MapReduce technologies targeting analytical query workloads
2. Designed to run on a shared-nothing cluster of commodity machines, or in the cloud
3. An attempt to fill the gap in the market for a free and open source parallel DBMS
4. Much more scalable than currently available parallel database systems and DBMS/MapReduce hybrid systems (see longer blog post).
5. As scalable as Hadoop, while achieving superior performance on structured data analysis workloads
See: http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html for more detail and http://sourceforge.net/projects/hadoopdb/ for source code for HadoopDB.
A more detailed paper has been accepted for publication at VLDB: http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf.
The development work for HadoopDB was done using AWS Elastic Compute Cluster. Nice work Daniel.
--jrh
James Hamilton, Amazon Web Services
1200, 12th Ave. S., Seattle, WA, 98144 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | james@amazon.com
H:mvdirona.com | W:mvdirona.com/jrh/work | blog:http://perspectives.mvdirona.com
Erasure coding provides redundancy for greater than single disk failure without 3x or higher redundancy. I still like full mirroring for hot data but the vast majority of the worlds data is cold and much of it never gets referenced after writing it: Measurement and Analysis of Large-Scale Network File System Workloads. For less-than-hot workloads, erasure coding is an excellent solution. Companies such as EMC, Data Domain, Maidsafe, Allmydata, Cleversafe, and Panasas are all building products based upon erasure coding.
At FAST 2009 in late February, A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries For Storage will be presented. This paper looks at 5 open source erasure coding systems and compares there relative performance. The open source erasure coding packages implement Read-Solomon, Cauchy Read-Solomon, Even-Odd, Row-Diagonal Parity (RDP), and Minimal Density RAID-6 codes.
The authors found:
· The special-purpose RAID-6 codes vastly outperform their general-purpose counterparts. RDP performs the best of these by a narrow margin.
· Cauchy Reed-Solomon coding outperforms classic Reed-Solomon coding significantly, as long as attention is paid to generating good encoding matrices.
· An optimization called Code-Specific Hybrid Reconstruction is necessary to achieve good decoding speeds in many of the codes.
· Parameter selection can have a huge impact on how well an implementation performs. Not only must the number of computational operations be considered, but also how the code interacts with the memory hierarchy, especially the caches.
· There is a need to achieve the levels of improvement that the RAID-6 codes show for higher numbers of failures.
The paper also provides a good introduction of how erasure coding works. Recommended. I expect erasure codes to spring up in many more application in the near future.
--jrh
James Hamilton, Amazon Web Services
1200, 12th Ave. S., Seattle, WA, 98144 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | james@amazon.com
H:mvdirona.com | W:mvdirona.com/jrh/work | blog:http://perspectives.mvdirona.com
Richard Jones of Last.fm has compiled an excellent list of
key-value stores in Anti-RDBMS:
A list of key-value stores.
In this post, Richard looks at Project
Voldemort, Ringo, Scalaris, Kai,
Dynomite, MemcacheDB, ThruDB,
CouchDB,
Cassandra, HBase
and Hypertable.
His conclusion for Last.fm use is that Project Voldemort has the most promise
with Scalaris being a close second and Dynomite is also interesting.
--jrh
James Hamilton, Amazon Web Services
1200, 12th Ave. S., Seattle,
WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | james@amazon.com
H:mvdirona.com | W:mvdirona.com/jrh/work | blog:http://perspectives.mvdirona.com
|