{"id":5051,"date":"2015-08-25T09:34:39","date_gmt":"2015-08-25T14:34:39","guid":{"rendered":"http:\/\/appcrawler.com\/wordpress\/?p=5051"},"modified":"2015-08-25T09:34:39","modified_gmt":"2015-08-25T14:34:39","slug":"unable-to-download-to-any-storage-directory","status":"publish","type":"post","link":"http:\/\/appcrawler.com\/wordpress\/2015\/08\/25\/unable-to-download-to-any-storage-directory\/","title":{"rendered":"Unable to download to any storage directory"},"content":{"rendered":"<p>We had an issue in which the secondary namenode was not checkpointing the edits file with updates from the primary namenode.  In case you are unaware, the secondary namenode (not in an HA configuration) regularly transfers the edits file from the primary name node, reads them, and applies the updates to the fsimage file.  In our case, we received an alert from Ambari that the checkpoint had not occurred for several hours.  This article is simply a record of how we troubleshot the issue.<\/p>\n<p>From the secondary namenode, try to force a checkpoint.  Note that this doesn&#8217;t work as it attempts to bind to the port on which the secondary namenode is running (assuming it is, as it was in our case).<\/p>\n<pre>\r\n[hdfs@cmhlddlkeedg02 ~]$ hadoop secondarynamenode -checkpoint force\r\nDEPRECATED: Use of this script to execute hdfs command is deprecated.\r\nInstead use the hdfs command for it.\r\n\r\n15\/08\/12 13:33:59 INFO namenode.SecondaryNameNode: STARTUP_MSG:\r\n\/************************************************************\r\nSTARTUP_MSG: Starting SecondaryNameNode\r\nSTARTUP_MSG:   host = cmhlddlkeedg02.expdev.local\/172.27.2.196\r\nSTARTUP_MSG:   args = [-checkpoint, force]\r\nSTARTUP_MSG:   version = 2.6.0.2.2.4.2-2\r\n\r\n<snip>\r\n\r\n15\/08\/12 13:34:01 INFO http.HttpServer2: HttpServer.start() threw a non Bind IOException\r\njava.net.BindException: Port in use: cmhlddlkeedg02.expdev.local:50090\r\n        at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:891)\r\n        at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:827)\r\n        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:276)\r\n        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:192)\r\n        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:671)\r\nCaused by: java.net.BindException: Address already in use\r\n        at sun.nio.ch.Net.bind0(Native Method)\r\n        at sun.nio.ch.Net.bind(Net.java:444)\r\n        at sun.nio.ch.Net.bind(Net.java:436)\r\n        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)\r\n        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)\r\n        at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)\r\n        at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:886)\r\n        ... 4 more\r\n<\/pre>\n<p>We then see the process that has it open is the secondary namenode, as expected&#8230;<\/p>\n<pre>\r\n[hdfs@cmhlddlkeedg02 ~]$ netstat -anp | grep 50090\r\n(Not all processes could be identified, non-owned process info\r\n will not be shown, you would have to be root to see it all.)\r\ntcp        0      0 172.27.2.196:50090          0.0.0.0:*                   LISTEN      22115\/java\r\ntcp        0      0 172.27.2.196:24296          172.27.2.196:50090          TIME_WAIT   -\r\n[hdfs@cmhlddlkeedg02 ~]$ ps -ef | grep 22115\r\nhdfs     20009 19291  0 13:34 pts\/1    00:00:00 grep 22115\r\nhdfs     22115     1  0 Jul22 ?        00:22:48 \/usr\/jdk64\/jdk1.7.0_67\/bin\/java -Dproc_secondarynamenode <snip> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode\r\n<\/pre>\n<p>We then identify and check the logs for secondary name node, and see the following exception&#8230;<\/p>\n<pre>\r\n[hdfs@cmhlddlkeedg02 ~]$ ls -lrt \/proc\/22115\/fd | grep log$\r\nl-wx------. 1 hdfs hadoop 64 Aug 11 15:54 339 -> \/var\/log\/hadoop\/hdfs\/hadoop-hdfs-secondarynamenode-cmhlddlkeedg02.expdev.local.log\r\n\r\n2015-08-12 13:32:38,871 ERROR namenode.SecondaryNameNode (SecondaryNameNode.java:doWork(399)) - Exception in doCheckpoint\r\njava.io.IOException: Unable to download to any storage directory\r\n        at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.receiveFile(TransferFsImage.java:505)\r\n        at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.doGetUrl(TransferFsImage.java:431)\r\n        at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:395)\r\n<\/pre>\n<p>We  next turn to our trusty friend strace() to see what the secondary namenode is doing.  We see it opens a socket to the primary namenode, and attempts via a REST call to get an edits file (edits_0000000000007192672-0000000000007202362 in the output below)&#8230;<\/p>\n<pre>\r\n22780 13:37:39.444699 connect(355, {sa_family=AF_INET, sin_port=htons(50070), sin_addr=inet_addr(\"172.27.2.195\")}, 16) = -1 EINPROGRESS (Operation now in progress)\r\n22780 13:37:39.445739 sendto(355, \"GET \/imagetransfer?getedit=1&startTxId=7192672&endTxId=7202362&storageInfo=-60:554790117:0:CID-18f28bd7-c752-4cde-9888-14b4c5c2a70d HTTP\/1.1\\r\\nUser-Agent: Java\/1.7.0_67\\r\\nHost: cmhlddlkeedg01.expdev.local:50070\\r\\nAccept: text\/html, image\/gif, image\/jpeg, *; q=.2, *\/*; q=.2\\r\\nConnection: keep-alive\\r\\n\\r\\n\", 298, 0, NULL, 0) = 298\r\n22780 13:37:39.447019 recvfrom(355, \"HTTP\/1.1 200 OK\\r\\nCache-Control: no-cache\\r\\nExpires: Wed, 12 Aug 2015 17:37:39 GMT\\r\\nDate: Wed, 12 Aug 2015 17:37:39 GMT\\r\\nPragma: no-cache\\r\\nExpires: Wed, 12 Aug 2015 17:37:39 GMT\\r\\nDate: Wed, 12 Aug 2015 17:37:39 GMT\\r\\nPragma: no-cache\\r\\nContent-Type: text\/plain; charset=utf-8\\r\\nContent-Length: 1643554\\r\\nContent-Disposition: attachment; filename=edits_0000000000007192672-0000000000007202362\\r\\nX-Image-Edits-Name: edits_0000000000007192672-0000000000007202362\\r\\nServer: Jetty(6.1.26.hwx)\\r\\n\\r\\n\\377\\377\\377\\304\\0\\0\\0\\0\\30\\0\\0\\0\\f\\0\\0\\0\\0\\0m\\300`\\301\\3.D\\37\\0\\0\\0\"..., 8192, 0, NULL, NULL) = 8192\r\n22780 13:37:39.448322 write(2, \"java.io.IOException: Unable to download to any storage directory\", 64) = 64\r\n<\/pre>\n<p>We check for the existence of this file on the primary namenode, and find it is there.  We also get its checksum&#8230;<\/p>\n<pre>\r\n[root@cmhlddlkeedg01 hadoop]# find . | grep edits_0000000000007192672-0000000000007202362 | xargs cksum\r\n4274517808 1643554 .\/hdfs\/namenode\/current\/edits_0000000000007192672-0000000000007202362\r\n[root@cmhlddlkeedg01 hadoop]#\r\n<\/pre>\n<p>We next try to get the edits file using wget (non-secure cluster).  We can successfully get the file, so we also check its checksum and see it is the same as it is on the primary namenode.<\/p>\n<pre>\r\n[hdfs@cmhlddlkeedg02 ~]$ wget \"http:\/\/cmhlddlkeedg01:50070\/imagetransfer?getedit=1&startTxId=7192672&endTxId=7202362&storageInfo=-60:554790117:0:CID-18f28bd7-c752-4cde-9888-14b4c5c2a70d\"\r\n--2015-08-12 13:44:08--  http:\/\/cmhlddlkeedg01:50070\/imagetransfer?getedit=1&startTxId=7192672&endTxId=7202362&storageInfo=-60:554790117:0:CID-18f28bd7-c752-4cde-9888-14b4c5c2a70d\r\nResolving cmhlddlkeedg01... 172.27.2.195\r\nConnecting to cmhlddlkeedg01|172.27.2.195|:50070... connected.\r\nHTTP request sent, awaiting response... 200 OK\r\nLength: 1643554 (1.6M) [text\/plain]\r\nSaving to: \u201cimagetransfer?getedit=1&startTxId=7192672&endTxId=7202362&storageInfo=-60:554790117:0:CID-18f28bd7-c752-4cde-9888-14b4c5c2a70d\u201d\r\n\r\n100%[=======================================================================================================================================>] 1,643,554   --.-K\/s   in 0.02s\r\n\r\n2015-08-12 13:44:08 (101 MB\/s) - \u201cimagetransfer?getedit=1&startTxId=7192672&endTxId=7202362&storageInfo=-60:554790117:0:CID-18f28bd7-c752-4cde-9888-14b4c5c2a70d\u201d saved [1643554\/1643554]\r\n\r\n[hdfs@cmhlddlkeedg02 ~]$ cksum imagetransfer\\?getedit\\=1\\&startTxId\\=7192672\\&endTxId\\=7202362\\&storageInfo\\=-60\\:554790117\\:0\\:CID-18f28bd7-c752-4cde-9888-14b4c5c2a70d\r\n4274517808 1643554 imagetransfer?getedit=1&startTxId=7192672&endTxId=7202362&storageInfo=-60:554790117:0:CID-18f28bd7-c752-4cde-9888-14b4c5c2a70d\r\n<\/pre>\n<p>Other steps we undertook were to check for any user limits such as open files, but found no issues.<\/p>\n<p>We finally opted to restart the secondary namenode, which resolved the issue.  What we should have done but did not think of until later is to take a thread dump of the secondary namenode.  My guess is we may have seen a BLOCKED thread somewhere, which would have warranted further investigation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We had an issue in which the secondary namenode was not checkpointing the edits file with updates from the primary namenode. In case you are unaware, the secondary namenode (not in an HA configuration) regularly transfers the edits file from&hellip;<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"http:\/\/appcrawler.com\/wordpress\/2015\/08\/25\/unable-to-download-to-any-storage-directory\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"footnotes":""},"categories":[21],"tags":[],"_links":{"self":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/5051"}],"collection":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/comments?post=5051"}],"version-history":[{"count":8,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/5051\/revisions"}],"predecessor-version":[{"id":5065,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/5051\/revisions\/5065"}],"wp:attachment":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/media?parent=5051"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/categories?post=5051"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/tags?post=5051"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}