PyHive unable to fetch logs from HiveServer2 when running in async mode

504 views Asked by At

I am running into a strange issue with PyHive running a Hive query in async mode. Internally, PyHive uses Thrift client to execute the query and to fetch logs (along with execution status). I am unable to fetch the logs of Hive query (map/reduce tasks, etc). cursor.fetch_logs() returns an empty data structure

Here is the code snippet

rom pyhive import hive  # or import hive or import trino
from TCLIService.ttypes import TOperationState

def run():
    cursor = hive.connect(host="10.x.y.z", port='10003', username='xyz', password='xyz', auth='LDAP').cursor()
    cursor.execute("select count(*) from schema1.table1 where date = '2021-03-13' ", async_=True)
    status = cursor.poll(True).operationState
    print(status)
    while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
        logs = cursor.fetch_logs()
        for message in logs:
            print("running ")
            print(message)

        # If needed, an asynchronous query can be cancelled at any time with:
        # cursor.cancel()
        print("running ")
        status = cursor.poll().operationState

    print
    cursor.fetchall()

The cursor is able to get operationState correctly but its unable to fetch the logs. Is there anything on HiveServer2 side that needs to be configured?

Thanks in advance

1

There are 1 answers

0
satish On BEST ANSWER

Closing the loop here in case someone else has same or similar issue with hive.

In my case the problem was the hiveserver configuration. Hive Server won't stream the logs if logging operation is not enabled. Following is the list I configured

hive.server2.logging.operation.enabled - true

hive.server2.logging.operation.level EXECUTION (basic logging - There are other values that increases the logging level)

hive.async.log.enabled false

hive.server2.logging.operation.log.location