So I've to make my Java-Spark application work in both locally (on windows machine) and on AWS EMR cluster.
This is my code:
readHDFSConfig.loadHDFslibrary(conf.get("velocity.template.path"), conf, sparkSession, conf.get("velocity.template.path"));
VelocityEngine ve = new VelocityEngine();
VelocityContext context = null;
//Below 3 lines I added to make it work in local mode (on windows machine), where velocityPath will be a windows direcotry.
Properties p = new Properties();
p.setProperty("file.resource.loader.path", velocityPath);
ve.init(p);
context = new VelocityContext();
This is how I'm calling the velocity templates:
String alum = rows.get(0).getAs(DeltaMembers.ALUM.toString());
context.put("alum", alum);
artifacts.append(getSnippet(ve, context, "alum"));
public String getSnippet(VelocityEngine ve, VelocityContext context, String type) throws DataProcessException {
StringWriter indexSnippet = new StringWriter();
try {
Template t = ve.getTemplate(type + ".vm");
t.merge(context, indexSnippet);
} catch (ResourceNotFoundException e) {
e.printStackTrace();
throw new DataProcessException("ResourceNotFoundException occured !!!! " + e.getMessage(), e);
} catch (ParseErrorException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
return indexSnippet.toString();
}
The readHDFSConfig.loadHDFslibrary function:
public void loadHDFslibrary(String hdfsPath , Configuration conf , SparkSession sparkSession , String defaultPath){
boolean statusFlag = true;
try {
FileSystem fs = FileSystem.get(conf);
LOGGER.info("HDFS Path is ..........."+hdfsPath);
for(FileStatus eachPath: fs.listStatus(new Path(conf.get(hdfsPath, defaultPath)))){
LOGGER.info("FileName is ..........."+eachPath.getPath());
sparkSession.sparkContext().addJar(eachPath.getPath().toString());
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IllegalArgumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
statusFlag = false;
}
LOGGER.info("File read flag value is -> ..........."+statusFlag);
}
Issue:
This is working in local machine when the velocityPath is a local windows path (resources folder). But on AWS EMR when the velocityPath is /usr/share/MPR_RESOURCES/velocityTemplate/ (exists both locally and in hdfs on EMR), the sections which are supposed to be populated by the .vm files are omitted even though the job is successful and I don't see any errors in the log.
I tried to hardcode the path in getSNippet function like Template t = ve.getTemplate(velocityPath+type + ".vm"); but the sections in xml are still omitted.