hadoop - Multiple datanode configuration in Pseudo-distributed mode

818 views Asked by At

I am newbie in hadoop. I have setup hadoop - Pseudo-distributed mode in single machine. My hdfs-site.xml configuration as default:

<configuration>
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/usr/local/hadoop/yarn_data/hdfs/datanode</value>
</property>

After run:

hdfs namenode -format
start-all.sh
jps

I have one namenode and one datanode.
I want to have multiple datanode on this machine and I try to config at this advice: stackoverflow and my config:

<configuration>
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/usr/local/hadoop/yarn_data/hdfs/datanode-1</value>
    <name>dfs.datanode.address</name>
    <value>0.0.0.0:9870</value>
    <name>dfs.datanode.http.address</name>
    <value>0.0.0.0:9090</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/usr/local/hadoop/yarn_data/hdfs/datanode-2</value>
    <name>dfs.datanode.address</name>
    <value>0.0.0.0:9871</value>
    <name>dfs.datanode.http.address</name>
    <value>0.0.0.0:9091</value>
</property>

And I gain zero datanode. Any help would be greatly appreciated.

1

There are 1 answers

6
OneCricketeer On BEST ANSWER

Key part of that linked answer is you got to maintain different configurations for each datanode instance

You cannot put two <name> and <value> sections as part of the same XML file

You are required to have two separate config files, one for each datanode.
However, I m not completely sure it is possible to have two HADOOP_CONF_DIR variables for unique Hadoop processes. There might be a way to do hadoop --config /some/path datanode, but start-dfs is just hiding that way to run a datanode away from you

That being said, assuming you have export HADOOP_CONF_DIR=/etc/hadoop and ls $HADOOP_CONF_DIR/hdfs-site.xml is working, then you can try the following in its own terminal

mkdir /etc/hadoop2
cp /etc/hadoop/* /etc/hadoop2/

# EDIT the new hdfs-site.xml file

hadoop --config /etc/hadoop2 datanode

I would recommend just using two separate virtual machines, because that'll more closely match a real-world scenario