Connection Issue While reading Hive table In HDInsight using Python

371 views Asked by At

All. I wanted to connect to Hive Database in HDInsight using Python I followed multiple blog and few Stackoverflow blogs also .but No luck . Below are my tries using pyhive and JayDeBeApi library .

Using JayDeBeApi

I have added hive-jdbc-1.2.1, httpclient-4.4, and httpcore-4.4.4 jars to current working directory and already installed thrift using pip install thrift . And code snipet is

import jaydebeapi

conn = jaydebeapi.connect("org.apache.hive.jdbc.HiveDriver",
       "jdbc:hive2://shaktiman.database.windows.net:443/;ssl=true;transportMode=http;httpPath=/hive2",
       ['admin', 'Abcdeertyoiu@1234'],
       "hive-jdbc-1.2.1.jar")

cursor = conn.cursor()
cursor.execute("select * from default.hivesampletable limit 50")
print(cursor.description)  # prints the result set's schema
results = cursor.fetchall()

But i am getting below error :

Traceback (most recent call last):
  File "ClassLoader.java", line 357, in java.lang.ClassLoader.loadClass
  File "Launcher.java", line 349, in sun.misc.Launcher$AppClassLoader.loadClass
  File "ClassLoader.java", line 424, in java.lang.ClassLoader.loadClass
  File "URLClassLoader.java", line 382, in java.net.URLClassLoader.findClass
java.lang.ClassNotFoundException: java.lang.ClassNotFoundException: org.apache.hive.service.cli.thrift.TCLIService$Iface

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "org.jpype.JPypeContext.java", line 330, in org.jpype.JPypeContext.callMethod
  File "Method.java", line 498, in java.lang.reflect.Method.invoke
  File "DelegatingMethodAccessorImpl.java", line 43, in sun.reflect.DelegatingMethodAccessorImpl.invoke
  File "NativeMethodAccessorImpl.java", line 62, in sun.reflect.NativeMethodAccessorImpl.invoke
  File "NativeMethodAccessorImpl.java", line -2, in sun.reflect.NativeMethodAccessorImpl.invoke0
  File "DriverManager.java", line 247, in java.sql.DriverManager.getConnection
  File "DriverManager.java", line 664, in java.sql.DriverManager.getConnection
  File "HiveDriver.java", line 105, in org.apache.hive.jdbc.HiveDriver.connect
Exception: Java Exception

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:/Learning Dir/PycharmProjects/Python/HdInsight-Hive/test.py", line 39, in <module>
    "hive-jdbc-1.2.1.jar")
  File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\jaydebeapi\__init__.py", line 412, in connect
    jconn = _jdbc_connect(jclassname, url, driver_args, jars, libs)
  File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\jaydebeapi\__init__.py", line 230, in _jdbc_connect_jpype
    return jpype.java.sql.DriverManager.getConnection(url, *dargs)
java.lang.NoClassDefFoundError: java.lang.NoClassDefFoundError: org/apache/hive/service/cli/thrift/TCLIService$Iface

Not sure ,what is the issue .

Using PyHive also i have tried, as below

from pyhive import hive
conn = hive.connect('hn0-shaktiman-po.ttl4q3khoz5uvb1d4jopix3kbg.cx.internal.cloudapp.net', port=10000,auth='NOSASL')
cursor = conn.cursor()
cursor.execute('SHOW DATABASES')
cursor.fetchall()

but still i am getting isuue :

"D:\Learning Dir\PycharmProjects\Python\venv\Scripts\python.exe" "D:/Learning Dir/PycharmProjects/Python/HdInsight-Hive/test2.py"
failed to resolve sockaddr for hn0-shaktiman-po.ttl4q3khoz5uvb1d4jopix3kbg.cx.internal.cloudapp.net:10000
Traceback (most recent call last):
  File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\thrift\transport\TSocket.py", line 99, in open
    addrs = self._resolveAddr()
  File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\thrift\transport\TSocket.py", line 42, in _resolveAddr
    socket.AI_PASSIVE | socket.AI_ADDRCONFIG)
  File "D:\Installation\Python\Python38-32\lib\socket.py", line 752, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed
Traceback (most recent call last):
  File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\thrift\transport\TSocket.py", line 99, in open
    addrs = self._resolveAddr()
  File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\thrift\transport\TSocket.py", line 42, in _resolveAddr
    socket.AI_PASSIVE | socket.AI_ADDRCONFIG)
  File "D:\Installation\Python\Python38-32\lib\socket.py", line 752, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:/Learning Dir/PycharmProjects/Python/HdInsight-Hive/test2.py", line 2, in <module>
    conn = hive.connect('hn0-shaktiman-po.ttl4q3khoz5uvb1d4jopix3kbg.cx.internal.cloudapp.net', port=10000,auth='NOSASL')
  File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\pyhive\hive.py", line 94, in connect
    return Connection(*args, **kwargs)
  File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\pyhive\hive.py", line 192, in __init__
    self._transport.open()
  File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\thrift\transport\TTransport.py", line 155, in open
    return self.__trans.open()
  File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\thrift\transport\TSocket.py", line 103, in open
    raise TTransportException(type=TTransportException.NOT_OPEN, message=msg, inner=gai)
thrift.transport.TTransport.TTransportException: failed to resolve sockaddr for hn0-shaktiman-po.ttl4q3khoz5uvb1d4jopix3kbg.cx.internal.cloudapp.net:10000

Also, few blogs are recomending changing hiveserver2 transport mode to "binary" from "http" . Tried it. But this is also not helpful for me ...

I would really appreciate if anyone could suggest some workable code or solution . Thanks In advance.

1

There are 1 answers

0
Subash On

Seems to me config/network issue.

  1. You can validate the connection from the host(from where the application is being submitted) to the HDI cluster (you can ignore if it's being submitted from the head node in HDI).Try using ip address here-hn0-shaktiman-po.ttl4q3khoz5uvb1d4jopix3kbg.cx.internal.cloudapp.net. You can get the ip address by running curl ifconfig.me inside the HDI cluster.
  2. Also try checking the port is not being used anywhere using telnet. Trying using 10001
  3. Try changing the value hive.server2.transport.mode from http to binary in Ambari