Please find below the application/environment details where the problem is seen.
- Java Web application deployed on Tomcat 9.0.35 with JRE Version 1.8.0_231-b11
- The application is running in a docker container deployed on Open shift Kubernetes Distribution platform.
I see lot of threads in the application are getting into a BLOCKED state sometimes for few mins. On thread dump analysis, it was found that java.net.InetAddress.getLocalHost call is taking too much time. Lot of threads are getting stuck here. The host name is fetched for every logger printed in the application.
The issue is intermittent. But when it occurs, the application/tomcat will go into a paused state which leads to the accumulation of lot of threads. After some time(few seconds), all the blocked threads are unblocked simultaneously. Because of the request concurrency, the application will run out of DB connections which it maintains in the pool leading to issues/slowness/service availability. As a fix, I have made sure to access the host name only once into a static variable and use the same throughout the logging process. I wanted to know the detailed root cause of this issue.
- Why this issue is occurring intermittently?
- Is there a problem with DNS look up in this kubernetes environment?
- We are using IPV4 protocol/addresses
- Are there any better approaches/fixes to handle this issue?
Sample below from the thread dump:
"https-jsse-nio-8443-exec-13" #95 daemon prio=5 os_prio=0 tid=0x00007fccadbba800 nid=0xaf5 waiting for monitor entry 0x00007fcb912d1000
java.lang.Thread.State: BLOCKED (on object monitor)
at java.net.InetAddress.getLocalHost(InetAddress.java:1486)
- waiting to lock <0x00000005e71878a0> (a java.lang.Object)
In JDK 8,
InetAddress.getLocalHost()
works as follows:Steps 2-4 are performed under the global
cacheLock
. If something goes wrong during this process, all threads callingInetAddress.getLocalHost()
will block at this lock - exactly what you observe.Usually local host name resolution does not end up in a network call, as long as the host address is hard-coded in
/etc/hosts
. But in your case it seems like the real network requests are involved (whenever TTL expires). And when the first DNS request times out (UDP is not a reliable protocol after all), a delay happens.The solution is to configure
/etc/hosts
to contain the name and the address of the local host, e.g.where
myhost.mydomain
is the same string as returned byhostname
command.Finally, if the host name is not expected to change while the application is running, caching it once and forever on the application level looks like a good fix.