ParquetFileReader leading to too many TCP connections in CLOSE_WAIT state

156 views Asked by At

I am trying to read meta data info from a parquet file :

metaData=ParquetFileReader.readFooter(fs.getConf(),file) ;

This line opens a connection in CLOSE_WAIT -state (checked using lsof -p pid command).

TCP rack162-hdp26-dev:36608->rack162-hdp26-dev:1019 (CLOSE_WAIT)

On more than 65,536 files it returns a "too many open files" -error (hence requires to restart my application). I tried replacing by :

try (ParquetFileReader r = ParquetFileReader.open(fs.getConf(), file)) {

     logger.info("Getting metadata for:" + file.toString());
     metaData = r.getFooter()

     //other code//
}

but still facing an issue. I already tried a parquet-hadoop jar of version 1.8.1, 1.10.1, 1.11.1 but facing issues with each.

0

There are 0 answers