Bridging the gap…from SQL to No SQL

For any DBA’s that have had someone call them and ask “What do you know about NoSQL?”, this post is specifically intended. Before you have a conniption fit and began quoting Tom Kyte-ism’s (an exceptionally smart guy, if you do want to quote him :)), realize that software is not a religion, and sometimes at least knowing who you are fighting can be a good thing 🙂

I am the type that must have an example from which to work to truly understand anything. Most examples on the internet talk about column names and values like ‘foobaz’, etc. I need something more concrete. I couldn’t find anything on the internet for a roll up your sleeves DBA trying to understand what NoSQL is, so I decided to start with this.

If you find yourself in a position where you have to at least know what NoSQL is, if not also support it, this will get you up and running with a very simple example. We will initially try to stay true to relational roots before you stray too far from your comfort zone. Since this post is example oriented, we will not describe all terms except when necessary. If you want more of an academic “book in your hands” type of overview, google NoSQL and you will get a ton of hits.

For this article, we will use the HBase flavor of NoSQL, on any version of Linux (I didn’t have a problem with CentOS). First, download a recent version of HBase from http://hbase.apache.org/

You can then extract it with the following (changing the file name to whatever you download)

tar zxvf hbase-0.90.3.tar.gz

You can start HBase out of the box simply running the $HBASE_HOME/bin/start-hbase.sh shell script

We start by creating a simple customers table (something you may find in a typical application RDBMS). Issue the following to get into the HBase shell (conceptually similar to Oracle’s SQL*Plus)

$HBASE_HOME/bin/hbase shell

Once you are in the HBase shell, create your table as follows.

create 'customers', 'cust_family'

What is this odd looking “DDL” statement? In NoSQL, it is a “schemaless” product…yeah, the hair stood up on the back of my neck the first time I heard it, also. Once again, google it for more information.

All the above statement does is create a table with a “column family” name “cust_family”. Column families allow us to create our tables with sets of columns dedicated to a specific purpose.

We can describe our table, similar to a describe in SQL*Plus, with the following:

hbase(main):022:0> describe 'customers'
DESCRIPTION                                                                                                        ENABLED
 {NAME => 'customers', FAMILIES => [{NAME => 'cust_family', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPR true
 ESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
  'true'}]}
1 row(s) in 0.0270 seconds

hbase(main):023:0>

Most of what is above you don’t need to know…yet. You definitely will later, though.

We can then load our first customer with something like the following:

hbase(main):017:0> put 'customers', '1', 'cust_family:cust_id', '1'
0 row(s) in 1.0310 seconds

hbase(main):018:0> put 'customers', '1', 'cust_family:cust_name', 'Steve'
0 row(s) in 0.0080 seconds

We can then either read our whole table with a scan (think table scan in oracle)

hbase(main):019:0> scan 'customers'
ROW                                           COLUMN+CELL
 1                                            column=cust_family:cust_id, timestamp=1309892148720, value=1
 1                                            column=cust_family:cust_name, timestamp=1309892158669, value=Steve
1 row(s) in 0.0140 seconds

…or with a get, which is an index lookup.

hbase(main):020:0> get 'customers', '1'
COLUMN                                        CELL
 cust_family:cust_id                          timestamp=1309892148720, value=1
 cust_family:cust_name                        timestamp=1309892158669, value=Steve
2 row(s) in 0.0120 seconds

How did we insert a record with a column name of cust_name and cust_id without that being in the describe statement we issued earlier? Remember, NoSQL is “schemaless”. If we using our application, and decide we want to now have a cust_creation_date column, we simply start adding that going forward with new puts (inserts).

Our next post will provide a java framework for loading records in bulk.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.