net.ipv4.ip_no_pmtu_disc and don’t fragment

We received an email asking that we restart a given server as the ATG lock manager had a required lock and a scheduled job could not move forward. I don’t like to do that without some triage.

We were stuck waiting on a call from our order management vendor (recvfrom() system call).

We get the PID doding the work by taking a java thread dump and see it is the same across dumps…

-bash-4.1$ jstack 32384 | awk '{if ($0 ~ "nid=") {thread=$5} else if ($0 ~ "vendorname") {print thread}}'
nid=0x8f1
-bash-4.1$ jstack 32384 | awk '{if ($0 ~ "nid=") {thread=$5} else if ($0 ~ "vendorname") {print thread}}'
nid=0x8f1
-bash-4.1$ jstack 32384 | awk '{if ($0 ~ "nid=") {thread=$5} else if ($0 ~ "vendorname") {print thread}}'
nid=0x8f1
-bash-4.1$

…so we get the PID for this thread…

-bash-4.1$ printf "%i\n" "0x8f1"
2289

…then trace the PID and see we are waiting in perpetuity on a recvfrom() call on a network socket…

-bash-4.1$ strace -p 2289
Process 2289 attached - interrupt to quit
recvfrom(1040, ^C 
Process 2289 detached

…so we get the file descriptor on which it is waiting…

-bash-4.1$ ls -lrt /proc/2289/fd | grep 1040
lrwx------. 1 sa-jboss domain users 64 Jun 25 11:32 1040 -> socket:[849616076]

…and we then see it is a socket, so we need to see where is it connecting…

-bash-4.1$ netstat -aenp | grep 849616076
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp        0      0 1.2.2.6:57779           10.0.8.2:5555          ESTABLISHED 11006      849616076  32384/java

…and we see the IP is our order management vendor.

We immediately assumed the sender was hung. However, when we dug deeper using tcpdump and wireshark, we saw we had enabled path MTU discovery in the Linux host OS.

tcpdump capture…

[~] # tcpdump host 54.208.67.100 -tttt -s 65535 -w hung.pcap
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
^C1458 packets captured
1470 packets received by filter
0 packets dropped by kernel

Wireshark screen of the above pcap file…

Enabling path MTU with the default value of 0 requires an ICMP reply to get the maximum MTU possible. It wasn’t getting this back, so our conjecture is that this was the issue.

As soon as we set net.ipv4.ip_no_pmtu_disc = 1, disabling path MTU, our problems went away.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.