I am trying to setup HDFS & Cloudera Manager via the Cloudera Manager API. However I am stuck at a specific point:
I setup all the HDFS roles, but the NameNode refuses to communicate with the data nodes. The relevant error from the DataNode log:
Initialization failed for Block pool BP-1653676587-172.168.215.10-1435054001015 (Datanode Uuid null) service to master.adastragrp.com/172.168.215.10:8022 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(172.168.215.11, datanodeUuid=1a114e5d-2243-442f-8603-8905b988bea7, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=cluster4;nsid=103396489;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:917)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:5085)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1140)
at
My DNS is configured via the hosts file, so I thought the following answer applies and tried the solution without success: https://stackoverflow.com/a/29598059/1319284
However, I have another small cluster with basically the same configuration as far as I can tell, which is working. DNS is configured through /etc/hosts as well, but here I set up the cluster via Cloudera Manager GUI instead of the API.
After that I finally found the configuration directory of the running NameNode process, and there I found a dfs_hosts_include file. Opening it reveals that only 127.0.0.1 is included. On the working cluster, all the nodes are included in that file. I find a similar weirdness in topology.map:
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera Manager-->
<topology>
<node name="master.adastragrp.com" rack="/default"/>
<node name="127.0.0.1" rack="/default"/>
<node name="slave.adastragrp.com" rack="/default"/>
<node name="127.0.0.1" rack="/default"/>
</topology>
... That doesn't look right. Again, on the working cluster the IPs are as expected.
Not only do I not know what went wrong, I also do not know how to influence these files, as they are all auto-generated by Cloudera Manager. Has anyone seen this before and could provide guidance here?
I finally found where I had the problem. The problem was in /etc/cloudera-scm-agent/config.ini
I generated this file with a template, and ended up with
which the cloudera-cm-agent happily reported to the server. For more information, see the question Salt changing /etc/hosts, but still caching old one?