I am newbie in hadoop. I have setup hadoop - Pseudo-distributed mode in single machine. My hdfs-site.xml configuration as default:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/datanode</value>
</property>
After run:
hdfs namenode -format
start-all.sh
jps
I have one namenode and one datanode.
I want to have multiple datanode on this machine and I try to config at this advice: stackoverflow and my config:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/datanode-1</value>
<name>dfs.datanode.address</name>
<value>0.0.0.0:9870</value>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:9090</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/datanode-2</value>
<name>dfs.datanode.address</name>
<value>0.0.0.0:9871</value>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:9091</value>
</property>
And I gain zero datanode. Any help would be greatly appreciated.
Key part of that linked answer is you got to maintain different configurations for each datanode instance
You cannot put two
<name>
and<value>
sections as part of the same XML fileYou are required to have two separate config files, one for each datanode.
However, I m not completely sure it is possible to have two
HADOOP_CONF_DIR
variables for unique Hadoop processes. There might be a way to dohadoop --config /some/path datanode
, butstart-dfs
is just hiding that way to run a datanode away from youThat being said, assuming you have
export HADOOP_CONF_DIR=/etc/hadoop
andls $HADOOP_CONF_DIR/hdfs-site.xml
is working, then you can try the following in its own terminalI would recommend just using two separate virtual machines, because that'll more closely match a real-world scenario