IPC Server handler / File does not exist, running hadoop in singularity container

19 views Asked by At

I don't have any experience with hadoop and am running into issues while attempting to run a singularity container that uses it - it seems like it's not really getting started and i'm trying to figure out why. After sifting through all the output, this seems to be the first indication of the problem:

2024-03-10 17:25:12 INFO  Server:3102 - IPC Server handler 7 on default port 51139, call Call#41 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete from localhost:60086 / 127.0.0.1:60086: java.io.FileNotFoundException: File does not exist: /user/ark19/cloudgene-cli/job-20240310-170017/temp/outputimputation/18/temp/_temporary/1/_temporary/attempt_1710089589002_0004_m_000000_0/part-m-00000 (inode 16888) [Lease.  Holder: DFSClient_attempt_1710089589002_0004_m_000000_0_-1772981706_1, pending creates: 1]

Since I have very little understanding of how hadoop works, I'm not even sure where to start troubleshooting, and the first thing that I observe is that i don't even see a /user dir:

Singularity> ls /
anaconda-post.log  bin  data  dev  environment  etc  home  hpc  lib  lib64  localdata  media  mnt  opt  proc  root  run  sbin  singularity  srv  sys  tmp  usr  var

I am working on my institution's computer cluster running AlmaLinux 9.3 & slurm (and running from inside the container):

srun -c 32 --x11 --mem=128G --pty bash -i
singularity shell --hostname localhost --bind ${workingDirectory}:/data --bind ${containerDirectory}/imputation-protocol-latest.sif

Perhaps I'm missing something obvious, or need to further troubleshoot my configuration. I thought I was potentially on to something when I discovered this comment in hadoop/config/hdfs-site.xml:

Immediately exit safemode as soon as one DataNode checks in. On a multi-node cluster, these configurations must be removed.

But lines following that comment didn't change anything.

Happy to provide further information on my set up but don't yet know what is most relevant. Thanks!

0

There are 0 answers