Hi i am using pywebhdfs python lib. i am connecting EMR by calling and trying to create file on HDFS. I am getting below exception which seems irrelevant against what i am performing as i am not hitting any connection limit here. is it due to how webhdfs works
from pywebhdfs.webhdfs import PyWebHdfsClient
hdfs = PyWebHdfsClient(host='myhost',port='50070', user_name='hadoop')
my_data = '01010101010101010101010101010101'
my_file = 'user/hadoop/data/myfile.txt'
hdfs.create_file(my_file, my_data)
throws:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='masterDNS', port=50070): Max retries exceeded with url: /webhdfs/v1/user/hadoop/data/myfile.txt?op=CREATE&user.name=hadoop (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 115] Operation now in progress',))
I had this issue as well. I found that for some reason the call to:
send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
is passed a timeout of 0, and that causes send to throw a
MaxRetryError
Bottom line, I found if you just set timeout = 1, it works fine:
hdfs = PyWebHdfsClient(host='yourhost', port='50070', user_name='hdfs', timeout=1)
Hope this works for you as well.