I'm using the following code to run the Java MapReduce application in an AWS EMR. Now, I need to run this in the TEZ engine, not the MapReduce engine. When I changed the mapreduce.framework.name
property to yarn-tez
and tested it against a smaller data set it worked. But when I run the application against a much higher data set it executes the map jobs 100%, but reduce jobs are stuck at 0% when I check the application status using yarn application -status
command, it's showing as the progress 0%. Will you be able to help me to understand the issue here?
public class MyFirstMapReduceMain {
public static void main(String[] args) {
try {
String inputSource = args[0];
String outputResult = args[1];
String hostName = args.length > 2 ? args[2] : "localhost";
String port = args.length > 3 ? args[3] : "10000";
String database = args.length > 4 ? args[4] : "hive_test_db";
String username = args.length > 5 ? args[5] : "";
String password = args.length > 6 ? args[6] : "";
String appType = args.length > 7 ? args[7] : "";
Configuration conf = new Configuration();
conf.set("reducer-hive-host", hostName);
conf.set("reducer-hive-port", port);
conf.set("reducer-hive-db", database);
conf.set("reducer-hive-un", username);
conf.set("reducer-hive-pw", password);
conf.set("mapreduce.framework.name", "yarn-tez");
conf.set("mapreduce.input.fileinputformat.inputformat", "org.apache.hadoop.ql.io.RCFileInputFormat");
conf.setInt("mapreduce.task.timeout", 1800000);
conf.setInt("mapred.task.timeout", 1800000);
Job job = Job.getInstance(conf, "MyFirst MapReduce Job");
setTextOutputFormatSeparator(job, "|");
job.setNumReduceTasks(Integer.valueOf(reduceCount));
job.setJarByClass(MyFirstMapReduceMain.class);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(inputSource));
FileOutputFormat.setOutputPath(job, new Path(outputResult));
if (job.waitForCompletion(true)) {
HiveHelper.loadOutputToHiveTable(new Path(outputResult), hostName, port, database, username, password, "output_hive_table");
} else {
System.exit(1);
}
} catch (Exception e) {
log.error("Exception occurred when executing MyFirstMapReduceMain", e);
}
}