Flashback ability with HBase

Oracle has a great feature called flashback query. This allows a user to (try to) query a row as it existed at some point in the past. It provides this functionality by trying to find the necessary UNDO to reconstruct the row as it existed at the time the user requested. If the necessary UNDO does not exist, an error is thrown. This feature is really cool.

HBase has something similar, only you configure to store a certain number of versions of the the row. In some ways, this is better since you know you will have that number of rows available to query, rather than hoping you have enough UNDO to find the data in which you are interested.

Below is a simple test case.

We start by creating a simple table with one column family…

hbase(main):058:0> create 'myxml', 'xmlfamily'
0 row(s) in 1.1440 seconds

We can see that by default, HBase creates a table to store the most recent three versions of a cell (what you normally would think of as a row)

hbase(main):060:0> describe 'myxml'
DESCRIPTION                                                                                                        ENABLED
 {NAME => 'myxml', FAMILIES => [{NAME => 'xmlfamily', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION true
  => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true
 '}]}
1 row(s) in 0.0240 seconds

We insert our first row…

hbase(main):062:0> put 'myxml', '1', 'xmlfamily:mdata_xml', 'first version'
0 row(s) in 0.0140 seconds

…and see our row was successfully inserted…

hbase(main):063:0> get 'myxml', '1'                                                                                                                                              COLUMN                                        CELL
 xmlfamily:mdata_xml                          timestamp=1310063719710, value=first version
1 row(s) in 0.0200 seconds

We proceed to re-insert the row (actually similar to a MERGE in Oracle since the row id is the same) and show that it is now the “current” version…

hbase(main):064:0> put 'myxml', '1', 'xmlfamily:mdata_xml', 'second version'
0 row(s) in 0.0220 seconds

hbase(main):065:0> get 'myxml', '1'
COLUMN                                        CELL
 xmlfamily:mdata_xml                          timestamp=1310063765017, value=second version
1 row(s) in 0.0150 seconds

We print all versions we have…

hbase(main):070:0> get 'myxml', '1', {COLUMN => 'xmlfamily:mdata_xml', VERSIONS =>3}
COLUMN                                        CELL
 xmlfamily:mdata_xml                          timestamp=1310063765017, value=second version
 xmlfamily:mdata_xml                          timestamp=1310063719710, value=first version
2 row(s) in 0.0160 seconds

We then insert a third version of the row, and see that we can still see all of our previous versions.

hbase(main):071:0> put 'myxml', '1', 'xmlfamily:mdata_xml', 'third version'
0 row(s) in 0.0130 seconds

hbase(main):072:0> get 'myxml', '1', {COLUMN => 'xmlfamily:mdata_xml', VERSIONS =>3}
COLUMN                                        CELL
 xmlfamily:mdata_xml                          timestamp=1310063851412, value=third version
 xmlfamily:mdata_xml                          timestamp=1310063765017, value=second version
 xmlfamily:mdata_xml                          timestamp=1310063719710, value=first version
3 row(s) in 0.0250 seconds

However, as soon as we insert a fourth row, our first version can no longer be seen. This makes sense, since our table is configured to only have the most recent three versions.

hbase(main):073:0> put 'myxml', '1', 'xmlfamily:mdata_xml', 'fourth version'
0 row(s) in 0.0090 seconds

hbase(main):075:0> get 'myxml', '1', {COLUMN => 'xmlfamily:mdata_xml', VERSIONS =>10}
COLUMN                                        CELL
 xmlfamily:mdata_xml                          timestamp=1310063861477, value=fourth version
 xmlfamily:mdata_xml                          timestamp=1310063851412, value=third version
 xmlfamily:mdata_xml                          timestamp=1310063765017, value=second version
3 row(s) in 0.0170 seconds

We don’t like this, so we decide to keep more previous versions of a given row, 20 to be specific. As such, we disable the table (required for alterations)

hbase(main):076:0> disable 'myxml'
0 row(s) in 2.0380 seconds

Just for grins, we try to insert a row, and find a Region exception is thrown. Please note that disabling a table would require an outage.

hbase(main):078:0> put 'myxml', '1', 'xmlfamily:mdata_xml', 'fifth version'

ERROR: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: NotServingRegionException: 1 time, servers with issues: dell11gr1:9002,

We proceed to make our VERSIONS change and the re-enable our table…

hbase(main):080:0> alter 'myxml', {NAME => 'xmlfamily', VERSIONS=>20}
0 row(s) in 0.0260 seconds

hbase(main):081:0> enable 'myxml'
0 row(s) in 2.0300 seconds

We find we can now insert our rows…

hbase(main):082:0> put 'myxml', '1', 'xmlfamily:mdata_xml', 'fifth version'
0 row(s) in 0.0190 seconds

hbase(main):083:0> put 'myxml', '1', 'xmlfamily:mdata_xml', 'sixth version'
0 row(s) in 0.0110 seconds

…but we can also now see our first row. A major compaction is the only time “old” rows are physically removed, so this is the reason they are still visible. The compaction is documented to occur once per day.

hbase(main):084:0> get 'myxml', '1', {COLUMN => 'xmlfamily:mdata_xml', VERSIONS =>10}
COLUMN                                        CELL
 xmlfamily:mdata_xml                          timestamp=1310063963540, value=sixth version
 xmlfamily:mdata_xml                          timestamp=1310063957685, value=fifth version
 xmlfamily:mdata_xml                          timestamp=1310063861477, value=fourth version
 xmlfamily:mdata_xml                          timestamp=1310063851412, value=third version
 xmlfamily:mdata_xml                          timestamp=1310063765017, value=second version
 xmlfamily:mdata_xml                          timestamp=1310063719710, value=first version
6 row(s) in 0.0190 seconds

hbase(main):085:0>

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.