I'm running a jobflow on ElasticMapReduce, that terminates after completing all steps.
How can I access the custom counters of each mapper or reducer after the cluster is killed? (maybe somewhere on s3 with the logs, if at all)
How can I access them programmatic (say from python boto, or a java clien, or by ssh to the machine) while the cluster is still running.
1) The counters will be in the job history logs found at:
They will be in JSON format so you may need to do some processing.
2) I would use the
aws
ors3cmd
CLI tools to grab and process them.You could also modify your hadoop jobs to write the counters to a file upon completion in whatever format you would like.
Something like: