I'm attempting to run a GATK (v4.4.0.0) pipeline called PathSeqPipelineSpark for metagenomic analysis. However, I'm encountering an issue when trying to run it in an AWS ParallelCluster environment. The specific error I'm receiving is:
12:56:43.768 INFO PathSeqPipelineSpark - Spark verbosity set to INFO (see --spark-verbosity argument)
12:56:43.808 INFO AbstractConnector - Stopped Spark@75ed636c{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
12:56:43.810 INFO SparkUI - Stopped Spark web UI at http://compute-128-dy-r6a4xlarge-1.hpc-ireland.pcluster:4040
12:56:43.819 INFO MapOutputTrackerMasterEndpoint - MapOutputTrackerMasterEndpoint stopped!
12:56:43.837 INFO MemoryStore - MemoryStore cleared
12:56:43.837 INFO BlockManager - BlockManager stopped
12:56:43.840 INFO BlockManagerMaster - BlockManagerMaster stopped
12:56:43.842 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint - OutputCommitCoordinator stopped!
12:56:43.853 INFO SparkContext - Successfully stopped SparkContext
12:56:43.853 INFO PathSeqPipelineSpark - Shutting down engine
[October 27, 2023 at 12:56:43 PM UTC] org.broadinstitute.hellbender.tools.spark.pathseq.PathSeqPipelineSpark done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=103079215104
***********************************************************************
A USER ERROR has occurred: Failed to read bam header from output/BAM/Pt0-GSM3454529-unaligned.bam
Caused by:failure to login: javax.security.auth.login.LoginException: **java.lang.NullPointerException: invalid null input: name**
at jdk.security.auth/com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:67)
at jdk.security.auth/com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:134)
at java.base/javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at java.base/javax.security.auth.login.LoginContext$4.run(LoginContext.java:679)
at java.base/javax.security.auth.login.LoginContext$4.run(LoginContext.java:677)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:677)
at java.base/javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at org.apache.hadoop.security.UserGroupInformation$HadoopLoginContext.login(UserGroupInformation.java:2065)
at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1975)
at org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:719)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:669)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:579)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3746)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3736)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3520)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:288)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.disq_bio.disq.impl.file.HadoopFileSystemWrapper.isDirectory(HadoopFileSystemWrapper.java:101)
at org.disq_bio.disq.HtsjdkReadsRddStorage.read(HtsjdkReadsRddStorage.java:148)
at org.disq_bio.disq.HtsjdkReadsRddStorage.read(HtsjdkReadsRddStorage.java:127)
at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSource.getHeader(ReadsSparkSource.java:188)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.initializeReads(GATKSparkTool.java:575)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.initializeToolInputs(GATKSparkTool.java:554)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:544)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
12:56:43.856 INFO ShutdownHookManager - Shutdown hook called
12:56:43.857 INFO ShutdownHookManager - Deleting directory /mnt/efs/clusterfcs/apps/CSI-microbies/CSI-Microbes-identification/test-10x-v2/lscratch/4219/tmp/spark-7eff556e-f790-4a8e-868f-5e0161da833c
Using GATK jar /mnt/efs/clusterfcs/apps/GATK/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar
It seems that Spark or Hadoop is attempting to retrieve a username, but it's coming up as null in the system.
In this case, Spark is being utilized in the pipeline for parallel processing, as running it serially is not feasible. Since Spark is self-contained within GATK, further configuration theoretically shouldn't be necessary.
Currently, I'm working on AWS using a ParallelCluster environment where I have various Slurm partitions set up with different EC2 machines and I am launch the jobs using sbatch
.
In an attempt to address the issue, I've tried passing the following as environment variables:
export HADOOP_USER_NAME=user
export SPARK_USER=user
This was done in an effort to create "dummy" users and hopefully prevent the null error.
Is there anything else I can try to resolve this issue? Thank you!