I'm replacing multiple machines in my Hadoop CDH 5.7 cluster. I started by adding a few new machines and decommission same amount of existing datanodes.
I noticed that blocks are marked as under-replicated when decommissioning a node.
Does it mean I'm at risk when decommissioning multiple nodes? Can I decommission all nodes in parallel? Is there a better way of replacing all machines?
Thanks!
Its obvious that when a node is down(or removed) the data is under-replicated. When you add a new node and rebalance this will automatically be fixed.
What's actually happening?
Lets say the replication factor on your cluster is 3. When a node is decommissioned, all the data stored on it is gone and the replication factor of that data is now 2 (and hence under replicated). Now when you add a new node and re-balance the missing copy is made again hence restoring the replication to the default.
Am I at risk?
Not if you are doing it one by one. That is replace a node and re-balance cluster. Repeat. (I think this is the only way! )
If you just remove multiple nodes there is good chance of losing data as you may lose all replications of some data(which resided on those nodes).
Don't decommission multiple nodes at once!