Friday, February 19, 2010

nosql - hypertable

I've been looking at some of the nosql databases and have been playing around with Hypertable a little. I had some difficulties getting my head around it, probably due more to my long experience with RDBMSs than Hypertable itself, so here are some notes about it.

Creating a table is analagous to creating a table in an RDBMS.

hypertable> create table foo ("bar", "bat");

Inserting data into the table is quite a bit different than what I am used to. First is that each 'row' inserted into the tab must have a row key, basically there is an implied row id column in every table. Second is that the insert is done as individual (row, column, value) tuples or cells. Third is that a single insert statement can insert into any number of cells for any number of rows for a single table. Fourth is that the columns are not really columns in the RDBMS sense but can hold data and multiple 'sub-columns' which can be dynamically defined.

hypertable> insert into foo values
         -> ("zap", "bar", "cat"),
         -> ("zap", "bat", "tap"),
         -> ("car", "bar:a", "nar"),
         -> ("car", "bar:b", "whal");
hypertable> select * from foo;
car    bar:a    nar
car    bar:b    whal
zap    bar    cat
zap    bat    tap
Each value inserted into the table is a tuple of '(' <row key>, <column name> [':'<column qualifier>], <value> ')'

There is no update statement, updates are simply inserts and Hypertable keeps history of all changes and timestamps each. Specifying an option of REVS=1 is analogous to a select in an RDBMS and gives the current revision of the row.

hypertable> insert into foo values
         -> ("zap", "bat", "lap"); 
hypertable> select * from foo where ROW="zap" DISPLAY_TIMESTAMPS;
2010-02-19 17:47:32.535534003    zap    bar    cat
2010-02-19 17:58:22.743904001    zap    bat    lap
2010-02-19 17:47:32.535534004    zap    bat    tap


hypertable> select * from foo where ROW="zap" DISPLAY_TIMESTAMPS REVS=1;
2010-02-19 17:47:32.535534003    zap    bar    cat
2010-02-19 17:58:22.743904001    zap    bat    lap
How the where clause is used  in Hypertable is quite different than in an RDBMS.  The value 'ROW' is the row key and can be used pretty much in the same way as a column in a boolean operation in SQL.
hypertable> select * from foo where ROW<="zap" DISPLAY_TIMESTAMPS REVS=1;
2010-02-19 17:47:32.535534001    car    bar:a    nar
2010-02-19 17:47:32.535534002    car    bar:b    whale
2010-02-19 17:47:32.535534003    zap    bar    cat
2010-02-19 17:58:22.743904001    zap    bat    lap
There is also a CELL option. I wanted to use this to select on column contents but it does not work that way. It allows one to select rows and columns by  row keys and column names. This probably has the most benefit if one uses column qualifiers ( dynamically created sub-columns).
hypertable> select * from foo where "a", "bar:a" <= CELL <= "zz", "bar:z";
car    bar:a    nar
car    bar:b    whale
zap    bar    cat
zap    bat    lap
zap    bat    tap
This basically sets up a range query where "a" <= ROW <= "zz" and there is a column such that "bar:a" <= column name <= "bar:z"

No comments: