Earlier this week Clustrix announced a MySQL compatible, scalable database appliance that caught my interest. Key features supported by Clustrix:

· MySQL protocol emulation (MySQL protocol supported so MySQL apps written to the MySQL client libraries just work)

· Hardware appliance delivery package in a 1U package including both NVRAM and disk

· Infiniband interconnect

· Shared nothing, distributed database

· Online operations including alter table add column

I like the idea of adopting a MySQL programming model. But, it’s incredibly hard to be really MySQL compatible unless each node is actually based upon the MySQL execution engine. And it’s usually the case that a shared nothing, clustered DB will bring some programming model constraints. For example, if global secondary indexes aren’t implemented, it’s hard to support uniqueness constraints on non-partition key columns and it’s hard to enforce referential integrity. Global secondary indexes maintenance implies a single insert, update, or delete that would normally only require a single node change would require atomic updates across many nodes in the cluster making updates more expensive and susceptible to more failure modes. Essentially, making a cluster look exactly the same as a single very large machine with all the same characteristics isn’t possible. But, many jobs that can’t be done perfectly are still well worth doing. If Clustrix delivers all they are describing, it should be successful.

I also like the idea of delivering the product as a hardware appliance. It keep the support model simple, reduces install and initial setup complexity, and enables application specific hardware optimizations.

Using Infiniband as a cluster interconnect is a nice choice as well. I believe that 10GigE with RDMA support will provide better price performance than Infiniband but commodity 10GigE volumes and quality RDMA support is still 18 to 24 months away so Inifiband is a good choice for today.

Going with a shared nothing architecture avoids dependence on expensive shared storage area networks and the scaling bottleneck of distributed lock managers. Each node in the cluster is an independent database engine with its own physical (local) metadata, storage engine, lock manager, buffer manager, etc. Each node has full control of the table partitions that reside on that node. Any access to those partitions must go through that node. Essentially, bringing the query to the data rather than the data to the query. This is almost always the right answer and it scales beautifully.

In operation, a client connects to one of the nodes in the cluster and submits a SQL statement. The statement is parsed and compiled. During compilation, the cluster-wide (logical) metadata is accessed as needed and an execution plan is produced. The cluster-wide (logical) metadata is either replicated to all nodes or stored centrally with local caching. The execution plan produced by the query compilation will be run on as many nodes as needed with the constraint that table or index access be on the nodes that house those table or index partitions. Operators higher in the execution plan can run on any node in the cluster. Rows flow between operators that span node boundaries over the infiniband network. The root of the query plan runs on the node where the query was started and the results are returned to client program using the MySQL client protocol

As described, this is a very big engineering project. I’ve worked on teams that have taken exactly this approach and they took several years to get to the first release and even subsequent releases had programming model constraints. I don’t know how far along Clustrix is a this point but I like the approach and I’m looking forward to learning more about their offering.

White paper: Clustrix: A New Approach

Press Release: Clustrix Emerges from Stealth Mode with Industry’s First Clustered DB

James Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com