{"id":3930,"date":"2014-06-27T08:23:29","date_gmt":"2014-06-27T13:23:29","guid":{"rendered":"http:\/\/appcrawler.com\/wordpress\/?p=3930"},"modified":"2014-06-27T08:23:29","modified_gmt":"2014-06-27T13:23:29","slug":"net-ipv4-ip_no_pmtu_disc-and-dont-fragment","status":"publish","type":"post","link":"http:\/\/appcrawler.com\/wordpress\/2014\/06\/27\/net-ipv4-ip_no_pmtu_disc-and-dont-fragment\/","title":{"rendered":"net.ipv4.ip_no_pmtu_disc and don&#8217;t fragment"},"content":{"rendered":"<p>We received an email asking that we restart a given server as the ATG lock manager had a required lock and a scheduled job could not move forward.  I don&#8217;t like to do that without some triage.<\/p>\n<p>We were stuck waiting on a call from our order management vendor (recvfrom() system call).<\/p>\n<p>We get the PID doding the work by taking a java thread dump and see it is the same across dumps&#8230;<\/p>\n<pre>\r\n-bash-4.1$ jstack 32384 | awk '{if ($0 ~ \"nid=\") {thread=$5} else if ($0 ~ \"vendorname\") {print thread}}'\r\nnid=0x8f1\r\n-bash-4.1$ jstack 32384 | awk '{if ($0 ~ \"nid=\") {thread=$5} else if ($0 ~ \"vendorname\") {print thread}}'\r\nnid=0x8f1\r\n-bash-4.1$ jstack 32384 | awk '{if ($0 ~ \"nid=\") {thread=$5} else if ($0 ~ \"vendorname\") {print thread}}'\r\nnid=0x8f1\r\n-bash-4.1$\r\n<\/pre>\n<p>\u2026so we get the PID for this thread\u2026<\/p>\n<pre>\r\n-bash-4.1$ printf \"%i\\n\" \"0x8f1\"\r\n2289\r\n<\/pre>\n<p>\u2026then trace the PID and see we are waiting in perpetuity on a recvfrom() call on a network socket\u2026<\/p>\n<pre>\r\n-bash-4.1$ strace -p 2289\r\nProcess 2289 attached - interrupt to quit\r\nrecvfrom(1040, ^C <unfinished ...>\r\nProcess 2289 detached\r\n<\/pre>\n<p>\u2026so we get the file descriptor on which it is waiting\u2026<\/p>\n<pre>\r\n-bash-4.1$ ls -lrt \/proc\/2289\/fd | grep 1040\r\nlrwx------. 1 sa-jboss domain users 64 Jun 25 11:32 1040 -> socket:[849616076]\r\n<\/pre>\n<p>\u2026and we then see it is a socket, so we need to see where is it connecting\u2026<\/p>\n<pre>\r\n-bash-4.1$ netstat -aenp | grep 849616076\r\n(Not all processes could be identified, non-owned process info\r\nwill not be shown, you would have to be root to see it all.)\r\ntcp        0      0 1.2.2.6:57779           10.0.8.2:5555          ESTABLISHED 11006      849616076  32384\/java\r\n<\/pre>\n<p>\u2026and we see the IP is our order management vendor.<\/p>\n<p>We immediately assumed the sender was hung.  However, when we dug deeper using tcpdump and wireshark, we saw we had enabled path MTU discovery in the Linux host OS.  <\/p>\n<p>tcpdump capture&#8230;<\/p>\n<pre>\r\n[~] # tcpdump host 54.208.67.100 -tttt -s 65535 -w hung.pcap\r\ntcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes\r\n^C1458 packets captured\r\n1470 packets received by filter\r\n0 packets dropped by kernel\r\n<\/pre>\n<p>Wireshark screen of the above pcap file&#8230;<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/appcrawler.com\/wordpress\/wp-content\/uploads\/2014\/06\/171.png\"><\/p>\n<p>Enabling path MTU with the default value of 0 requires an ICMP reply to get the maximum MTU possible.  It wasn&#8217;t getting this back, so our conjecture is that this was the issue.<\/p>\n<p>As soon as we set net.ipv4.ip_no_pmtu_disc = 1, disabling path MTU, our problems went away.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We received an email asking that we restart a given server as the ATG lock manager had a required lock and a scheduled job could not move forward. I don&#8217;t like to do that without some triage. We were stuck&hellip;<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"http:\/\/appcrawler.com\/wordpress\/2014\/06\/27\/net-ipv4-ip_no_pmtu_disc-and-dont-fragment\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"footnotes":""},"categories":[28,56],"tags":[],"_links":{"self":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/3930"}],"collection":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/comments?post=3930"}],"version-history":[{"count":14,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/3930\/revisions"}],"predecessor-version":[{"id":4466,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/3930\/revisions\/4466"}],"wp:attachment":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/media?parent=3930"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/categories?post=3930"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/tags?post=3930"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}