{"id":2971,"date":"2013-05-30T10:42:46","date_gmt":"2013-05-30T15:42:46","guid":{"rendered":"http:\/\/appcrawler.com\/wordpress\/?p=2971"},"modified":"2013-05-31T07:01:34","modified_gmt":"2013-05-31T12:01:34","slug":"querying-hadoop-from-tomcat","status":"publish","type":"post","link":"http:\/\/appcrawler.com\/wordpress\/2013\/05\/30\/querying-hadoop-from-tomcat\/","title":{"rendered":"Querying Hadoop from Tomcat"},"content":{"rendered":"<p>Below is a very simple example for how to print in Tomcat the contents of a file stored in HDFS.<\/p>\n<p>I am not entirely sure where this would be useful, for the following reasons:<\/p>\n<p>*  Unless you have a list of the output files from a set of map reduce jobs, and can parse the output in a JSP, the raw files may not be relevant<br \/>\n*  HUE (the Hadoop UI from Cloudera) already has a file browser component<\/p>\n<p>However, the exercise was very useful in understanding the Hadoop filesystem API, and how it can be used in a web front end.<\/p>\n<p>Use it as a learning tool.<\/p>\n<pre lang=\"java\" line=\"1\">\r\n<%@page import=\"java.io.*, java.net.*, java.util.*,org.apache.hadoop.conf.*,org.apache.hadoop.fs.*;\"%>\r\n<%\r\n  try {\r\n    Configuration conf = new Configuration();\r\n    FileSystem fs = FileSystem.get(new URI(\"hdfs:\/\/expressdb1:9000\"), conf);\r\n    Path file = new Path(request.getParameter(\"fname\"));\r\n    FSDataInputStream getIt = fs.open(file);\r\n    BufferedReader d = new BufferedReader(new InputStreamReader(getIt));\r\n    String s = \"\";\r\n    while ((s = d.readLine()) != null) {\r\n      out.println(s + \"<br>\");\r\n    }\r\n    d.close();\r\n    fs.close();\r\n  }\r\n  catch (Exception e) {\r\n    e.printStackTrace();\r\n  }\r\n%>\r\n<\/pre>\n<p>If you wanted to take this to the next level, you could list every file in HDFS, and construct a hyperlink with the file name.  The page would either:<\/p>\n<p>present a list of hyperlinks to each HDFS file if the fname query string parameter wasn&#8217;t present<\/p>\n<p>-or-<\/p>\n<p>fetch the file in the fname parameter if the fname query string parameter was present<\/p>\n<p>What is below does exactly that.<\/p>\n<pre lang=\"java\" line=\"1\">\r\n<%@page import=\"java.io.*, java.net.*, java.util.*,org.apache.hadoop.conf.*,org.apache.hadoop.fs.*;\"%>\r\n<%\r\n  try {\r\n    Configuration conf = new Configuration();\r\n    FileSystem fs = FileSystem.get(new URI(\"hdfs:\/\/expressdb1:9000\"), conf);\r\n    if (request.getParameter(\"fname\") == null) {\r\n      RemoteIterator files = fs.listFiles(new Path(\"\/\"),true);\r\n      while (files.hasNext()) {\r\n        LocatedFileStatus lfs = (LocatedFileStatus)files.next();\r\n        out.println(\"<a href=test.jsp?fname=\" + lfs.getPath() + \">\" + lfs.getPath() + \"<\/a><br>\");\r\n      }\r\n    }\r\n    else {\r\n      Path file = new Path(request.getParameter(\"fname\"));\r\n      FSDataInputStream fileIn = fs.open(file);\r\n      BufferedReader d = new BufferedReader(new InputStreamReader(fileIn));\r\n      String s = \"\";\r\n      while ((s = d.readLine()) != null) {\r\n        out.println(s + \"<br>\");\r\n      }\r\n      d.close();\r\n    }\r\n    fs.close();\r\n  }\r\n  catch (Exception e) {\r\n    e.printStackTrace();\r\n  }\r\n%>\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Below is a very simple example for how to print in Tomcat the contents of a file stored in HDFS. I am not entirely sure where this would be useful, for the following reasons: * Unless you have a list&hellip;<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"http:\/\/appcrawler.com\/wordpress\/2013\/05\/30\/querying-hadoop-from-tomcat\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"footnotes":""},"categories":[19,24,21,25],"tags":[],"_links":{"self":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/2971"}],"collection":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/comments?post=2971"}],"version-history":[{"count":17,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/2971\/revisions"}],"predecessor-version":[{"id":3002,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/2971\/revisions\/3002"}],"wp:attachment":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/media?parent=2971"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/categories?post=2971"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/tags?post=2971"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}