Configure hadoop to tolerate server failures

50 views Asked by At

I am trying to configure a 50-node Hadoop 2.6.0 cluster for failure tolerance. Specifically, I'd like to be able to suddenly stop 5 servers and still have my job complete. So far, stopping even 1 server causes my job to fail with too many map failures error.

We host HDFS on the same cluster with replication factor = 2.

Can someone provide guidance on how to do this?

Having looked at similar posts, I am not not looking to have my job complete on subset of data.

0

There are 0 answers