I have this very simple upload method to upload a file to a one-node hdp2.5 cluster:
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(new URI("webhdfs://hdsfhost:50070", conf);
fs.copyFromLocalFile(false, true, new Path(localFilePath), new Path(hdfsPath));
Tracing what happens the flow starts correctly:
- connect to hdfshost:50070,
- check if file already exists (no),
- connect to datanode.
That is where it fails: the datanode is found to be localhost:50075 instead of hdfshost:50075, resulting in a "java.net.ConnectException: Connection refused".
I have the following relevant settings on hdp:
- dfs.client.use.datanode.hostname => true
- dfs.datanode.http.address => 0.0.0.0:50075
- dfs.namenode.http-address => 0.0.0.0:50070
I could not find any reason why localhost is used instead of hdfshost (and there is no override in /etc/hosts, neither on the local machine neither on the cluster). Any help would be very appreciated.
You need to change your configuration of the http-address to your local IP address instead of 0.0.0.0. 0.0.0.0 gets resolved to localhost and will then be used by
dfs.client.use.datanode.hostname => true
while your local IP address will be resolved to the DNS name and then be used by hostname again.Since it works I will post this as an answer, thus I don't know if my reasoning for the solution is correct. If anybody knows the exact reason please add it as a comment or edit my answer.