I understood that in MRv2 all datanodes reports to multiple namenodes regarding blocks with heartbeats. where does this datanodes exactly report so that it will be saved across all namenodes? If any of the namenode goes down will the cluster loose some block information?
What does namespace and block pool mean in MapReduce 2.0 YARN?
473 views Asked by hadooper At
2
There are 2 answers
0
On
You are talking about Federation and High Availability concepts in HDFS. Please look at Chapter 3, HDFS Concepts in "Hadoop The definitive guide". In short for your question, namespace means, when we add more namenodes(reason- scaling), each of them have a namespace under which the namenode is responsible for. And block pool, has all the blocks specific to that particular namespace. Namespaces are independent. Concept is similar to xml namespace.
As you know that in Hadoop 2.x series implementation, there is a pair of namenodes in an active-standby configuration.
If active namenode fails, then standby takes over active namenode duties.
Both the active namenode and standby namenode shares their edit log's, so that when a standby namenode takes over, it reads up to the end of the shared edit log to synchronize its state with the active namenode.
Also Datanodes also must send block reports to the both namenodes so that both the namenodes are aware of up-to-date block mapping.
So in-case of failure, standby is aware of block mapping and latest edit log and thus standy can take over very quickly.