{"id":1399,"date":"2011-07-27T10:36:19","date_gmt":"2011-07-27T15:36:19","guid":{"rendered":"http:\/\/appcrawler.com\/wordpress\/?p=1399"},"modified":"2011-07-27T10:36:19","modified_gmt":"2011-07-27T15:36:19","slug":"flashback-ability-with-hbase","status":"publish","type":"post","link":"http:\/\/appcrawler.com\/wordpress\/2011\/07\/27\/flashback-ability-with-hbase\/","title":{"rendered":"Flashback ability with HBase"},"content":{"rendered":"<p>Oracle has a great feature called flashback query.  This allows a user to (try to) query a row as it existed at some point in the past.  It provides this functionality by trying to find the necessary UNDO to reconstruct the row as it existed at the time the user requested.  If the necessary UNDO does not exist, an error is thrown.  This feature is really cool.<\/p>\n<p>HBase has something similar, only you configure to store a certain number of versions of the the row.  In some ways, this is better since you know you will have that number of rows available to query, rather than hoping you have enough UNDO to find the data in which you are interested.<\/p>\n<p>Below is a simple test case.<\/p>\n<p>We start by creating a simple table with one column family&#8230;<\/p>\n<pre lang=\"text\">\r\nhbase(main):058:0> create 'myxml', 'xmlfamily'\r\n0 row(s) in 1.1440 seconds\r\n<\/pre>\n<p>We can see that by default, HBase creates a table to store the most recent three versions of a cell (what you normally would think of as a row)<\/p>\n<pre lang=\"text\">\r\nhbase(main):060:0> describe 'myxml'\r\nDESCRIPTION                                                                                                        ENABLED\r\n {NAME => 'myxml', FAMILIES => [{NAME => 'xmlfamily', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION true\r\n  => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true\r\n '}]}\r\n1 row(s) in 0.0240 seconds\r\n<\/pre>\n<p>We insert our first row&#8230;<\/p>\n<pre lang=\"text\">\r\nhbase(main):062:0> put 'myxml', '1', 'xmlfamily:mdata_xml', '<x>first version<\/x>'\r\n0 row(s) in 0.0140 seconds\r\n<\/pre>\n<p>&#8230;and see our row was successfully inserted&#8230;<\/p>\n<pre lang=\"text\">\r\nhbase(main):063:0> get 'myxml', '1'                                                                                                                                              COLUMN                                        CELL\r\n xmlfamily:mdata_xml                          timestamp=1310063719710, value=<x>first version<\/x>\r\n1 row(s) in 0.0200 seconds\r\n<\/pre>\n<p>We proceed to re-insert the row (actually similar to a MERGE in Oracle since the row id is the same) and show that it is now the &#8220;current&#8221; version&#8230;<\/p>\n<pre lang=\"text\">\r\nhbase(main):064:0> put 'myxml', '1', 'xmlfamily:mdata_xml', '<x>second version<\/x>'\r\n0 row(s) in 0.0220 seconds\r\n\r\nhbase(main):065:0> get 'myxml', '1'\r\nCOLUMN                                        CELL\r\n xmlfamily:mdata_xml                          timestamp=1310063765017, value=<x>second version<\/x>\r\n1 row(s) in 0.0150 seconds\r\n<\/pre>\n<p>We print all versions we have&#8230;<\/p>\n<pre lang=\"text\">\r\nhbase(main):070:0> get 'myxml', '1', {COLUMN => 'xmlfamily:mdata_xml', VERSIONS =>3}\r\nCOLUMN                                        CELL\r\n xmlfamily:mdata_xml                          timestamp=1310063765017, value=<x>second version<\/x>\r\n xmlfamily:mdata_xml                          timestamp=1310063719710, value=<x>first version<\/x>\r\n2 row(s) in 0.0160 seconds\r\n<\/pre>\n<p>We then insert a third version of the row, and see that we can still see all of our previous versions.<\/p>\n<pre lang=\"text\">\r\nhbase(main):071:0> put 'myxml', '1', 'xmlfamily:mdata_xml', '<x>third version<\/x>'\r\n0 row(s) in 0.0130 seconds\r\n\r\nhbase(main):072:0> get 'myxml', '1', {COLUMN => 'xmlfamily:mdata_xml', VERSIONS =>3}\r\nCOLUMN                                        CELL\r\n xmlfamily:mdata_xml                          timestamp=1310063851412, value=<x>third version<\/x>\r\n xmlfamily:mdata_xml                          timestamp=1310063765017, value=<x>second version<\/x>\r\n xmlfamily:mdata_xml                          timestamp=1310063719710, value=<x>first version<\/x>\r\n3 row(s) in 0.0250 seconds\r\n<\/pre>\n<p>However, as soon as we insert a fourth row, our first version can no longer be seen.  This makes sense, since our table is configured to only have the most recent three versions.<\/p>\n<pre lang=\"text\">\r\nhbase(main):073:0> put 'myxml', '1', 'xmlfamily:mdata_xml', '<x>fourth version<\/x>'\r\n0 row(s) in 0.0090 seconds\r\n\r\nhbase(main):075:0> get 'myxml', '1', {COLUMN => 'xmlfamily:mdata_xml', VERSIONS =>10}\r\nCOLUMN                                        CELL\r\n xmlfamily:mdata_xml                          timestamp=1310063861477, value=<x>fourth version<\/x>\r\n xmlfamily:mdata_xml                          timestamp=1310063851412, value=<x>third version<\/x>\r\n xmlfamily:mdata_xml                          timestamp=1310063765017, value=<x>second version<\/x>\r\n3 row(s) in 0.0170 seconds\r\n<\/pre>\n<p>We don&#8217;t like this, so we decide to keep more previous versions of a given row, 20 to be specific.  As such, we disable the table (required for alterations)<\/p>\n<pre lang=\"text\">\r\nhbase(main):076:0> disable 'myxml'\r\n0 row(s) in 2.0380 seconds\r\n<\/pre>\n<p>Just for grins, we try to insert a row, and find a Region exception is thrown.  Please note that disabling a table would require an outage.<\/p>\n<pre lang=\"text\">\r\nhbase(main):078:0> put 'myxml', '1', 'xmlfamily:mdata_xml', '<x>fifth version<\/x>'\r\n\r\nERROR: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: NotServingRegionException: 1 time, servers with issues: dell11gr1:9002,\r\n<\/pre>\n<p>We proceed to make our VERSIONS change and the re-enable our table&#8230;<\/p>\n<pre lang=\"text\">\r\nhbase(main):080:0> alter 'myxml', {NAME => 'xmlfamily', VERSIONS=>20}\r\n0 row(s) in 0.0260 seconds\r\n\r\nhbase(main):081:0> enable 'myxml'\r\n0 row(s) in 2.0300 seconds\r\n<\/pre>\n<p>We find we can now insert our rows&#8230;<\/p>\n<pre lang=\"text\">\r\nhbase(main):082:0> put 'myxml', '1', 'xmlfamily:mdata_xml', '<x>fifth version<\/x>'\r\n0 row(s) in 0.0190 seconds\r\n\r\nhbase(main):083:0> put 'myxml', '1', 'xmlfamily:mdata_xml', '<x>sixth version<\/x>'\r\n0 row(s) in 0.0110 seconds\r\n<\/pre>\n<p>&#8230;but we can also now see our first row.  A major compaction is the only time &#8220;old&#8221; rows are physically removed, so this is the reason they are still visible.  The compaction is documented to occur once per day.<\/p>\n<pre lang=\"text\">\r\nhbase(main):084:0> get 'myxml', '1', {COLUMN => 'xmlfamily:mdata_xml', VERSIONS =>10}\r\nCOLUMN                                        CELL\r\n xmlfamily:mdata_xml                          timestamp=1310063963540, value=<x>sixth version<\/x>\r\n xmlfamily:mdata_xml                          timestamp=1310063957685, value=<x>fifth version<\/x>\r\n xmlfamily:mdata_xml                          timestamp=1310063861477, value=<x>fourth version<\/x>\r\n xmlfamily:mdata_xml                          timestamp=1310063851412, value=<x>third version<\/x>\r\n xmlfamily:mdata_xml                          timestamp=1310063765017, value=<x>second version<\/x>\r\n xmlfamily:mdata_xml                          timestamp=1310063719710, value=<x>first version<\/x>\r\n6 row(s) in 0.0190 seconds\r\n\r\nhbase(main):085:0>\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Oracle has a great feature called flashback query. This allows a user to (try to) query a row as it existed at some point in the past. It provides this functionality by trying to find the necessary UNDO to reconstruct&hellip;<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"http:\/\/appcrawler.com\/wordpress\/2011\/07\/27\/flashback-ability-with-hbase\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"footnotes":""},"categories":[19,20],"tags":[],"_links":{"self":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/1399"}],"collection":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/comments?post=1399"}],"version-history":[{"count":25,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/1399\/revisions"}],"predecessor-version":[{"id":1473,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/1399\/revisions\/1473"}],"wp:attachment":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/media?parent=1399"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/categories?post=1399"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/tags?post=1399"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}