Where to access EMR counters for a terminated or running cluster

800 views Asked by At

I'm running a jobflow on ElasticMapReduce, that terminates after completing all steps.

  1. How can I access the custom counters of each mapper or reducer after the cluster is killed? (maybe somewhere on s3 with the logs, if at all)

  2. How can I access them programmatic (say from python boto, or a java clien, or by ssh to the machine) while the cluster is still running.

1

There are 1 answers

0
vlahmot On

1) The counters will be in the job history logs found at:

$LOG_PATH/$CLUSTER_ID/hadoop-mapreduce/history/$YEAR/$MONTH/$DAY/$JOB_ID.jhist.gz

They will be in JSON format so you may need to do some processing.

2) I would use the aws or s3cmd CLI tools to grab and process them.

You could also modify your hadoop jobs to write the counters to a file upon completion in whatever format you would like.

Something like:

 //Rest of job setup
job.waitForCompletion(true);

    FileSystem fs = FileSystem.get(URI.create(outputPath), job.getConfiguration());
    FSDataOutputStream fsDataOutputStream = fs.create(new Path(outputPath + "/counters_output.csv"));
    PrintWriter writer = new PrintWriter(fsDataOutputStream);

    Counters counters = job.getCounters();
    for (CounterGroup counterGroup : counters) {
        for (Counter counter : counterGroup) {
            writer.write(counter.getName() + "," + counter.getValue());
        }
    }

    writer.close();
    fsDataOutputStream.close();
    fs.close();