As per our requirement, the output of one job will be the input of other job.
By using Multiple outputs concepts we are creating a new folder in output path and writing those records into folder. This is how it looks like :
OPFolder1/MultipleOP/SplRecords-m-0000*
OPFolder1/part-m-0000* files
When the new job is using the input as OPFolder1, I am facing the below error
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:298)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path is not a file: /user/abhime01/OPFolder1/MultiplOP/
Is there any way or property, to make hadoop, read only the files rather than folders.
Set
mapreduce.input.fileinputformat.input.dir.recursive
totrue
. See FileInputFormat doesn't read files recursively in the input path dir.