I just woke up to a failed 16h long EMR MpaReduce job that failed because of a 'few' mappers that timed out.
Is there a way to rerun only those failed mappers (yes it makes sense in my specific use case)? How?
I just woke up to a failed 16h long EMR MpaReduce job that failed because of a 'few' mappers that timed out.
Is there a way to rerun only those failed mappers (yes it makes sense in my specific use case)? How?
Too late for a real-time question. In general - No.
But sometimes it's possible. If you can take the trouble of finding out exactly what splits were being processed by the failed mappers (from the mapper logs) - and if this was a map-only job - then you could create a custom job that went only after the failed splits. Very hard in general - especially since splits typically span files.