I'm Hadoop in Colab and I have two documents that I've made in Pycharm, one with the mapper and another one with the reducer part.
This is the code:
!apt-get install -y openjdk-11-jdk-headless -qq > /dev/null
!wget https://downloads.apache.org/hadoop/common/hadoop-3.3.3/hadoop-3.3.3.tar.gz
!tar -xzf hadoop-3.3.3.tar.gz
!mv hadoop-3.3.3/ /usr/local/
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"
os.environ["HADOOP_HOME"] = "/usr/local/hadoop-3.3.3"
os.environ["PATH"] += os.pathsep + "/usr/local/hadoop-3.3.3/bin"
!chmod u+x ./mapperModaGastoPorPersona.py
!chmod u+x ./reducerModaGastoPorPersona.py
!hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.3.3.jar -file ./mapperModaGastoPorPersona.py -mapper ./mapperModaGastoPorPersona.py -file ./reducerModaGastoPorPersona.py -reducer ./reducerModaGastoPorPersona.py -input Datos_actividad_1.txt -output ./salidaModaGastoPersona1
And the result in the last part is this:
2022-12-14 12:18:36,116 ERROR streaming.PipeMapRed: configuration exception
java.io.IOException: Cannot run program "/content/./reducerModaGastoPorPersona.py": error=2, No such file or directory
(...)
Caused by: java.io.IOException: error=2, No such file or directory
(...)
2022-12-14 12:18:36,119 INFO mapred.LocalJobRunner: reduce task executor complete.
2022-12-14 12:18:36,122 WARN mapred.LocalJobRunner: job_local520054471_0001
java.lang.Exception: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:115)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:81)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:139)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:411)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
... 10 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeReducer.configure(PipeReducer.java:67)
... 15 more
Caused by: java.io.IOException: Cannot run program "/content/./reducerModaGastoPorPersona.py": error=2, No such file or directory
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 16 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:340)
at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:271)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1107)
... 18 more
2022-12-14 12:18:36,777 INFO mapreduce.Job: Job job_local520054471_0001 failed with state FAILED due to: NA
2022-12-14 12:18:36,794 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=4175
FILE: Number of bytes written=644002
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=102
Map output records=102
Map output bytes=869
Map output materialized bytes=1079
Input split bytes=87
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=1079
Reduce input records=0
Reduce output records=0
Spilled Records=102
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=350224384
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=969
File Output Format Counters
Bytes Written=0
2022-12-14 12:18:36,801 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
I've uploaded the documents "mapperModaGastoPorPersona.py", "reducerModaGastoPorPersona.py" and "Datos_actividad_1.txt" in Colab.