HDFS datanode Large number of TCP connections in CLOSE_WAIT state

453 views Asked by At

I'm using Apache Druid with a containerized deployment of HDFS in my testbed. After running stably for 5 days, I see one of the HDFS workers is reported as dead on the HDFS UI. Inside the container of this 'dead' worker, I see the process is still alive but there are thousands of TCP connections in the CLOSE_WAIT state. I see quite a few issues been filed on the HDFS JIRA page against different versions of HDFS.

HDFS version: 2.7.5.

Container ulimit: Max of a 1048576 files.

Druid is the only component that's interfacing with HDFS. There's no custom code been written that would be failing to call a close().

Has anyone seen a similar issue and worked around it?

0

There are 0 answers