I know Hadoop/Spark framework will detect failed or slow machines and execute the same tasks on different machine. How will (On what basis) framework identifies the slow running machines. Is there any kind of stats for the framework to decide?
Can someone shed light some light here?
The MapReduce model is to break jobs into tasks and run the tasks in parallel to make the overall job execution time smaller than it would be if the tasks ran sequentially.
yarn.app.mapreduce.am.job.task.estimator.class
- When MapReduce model lunch a new job this property and implementation is being used to estimate the task completion time at runtime. The estimated completion time for a task should be less than a minute. If a task is running beyond this estimated time it can mark as slow running task.yarn.app.mapreduce.am.job.speculator.class
- This property is being used to implementing the speculative execution policy.