Query Hadoop from Python

Asked by At

Hoping this can be solved. At the moment this works:

import pyodbc, sys, os
import pandas as pd**

def get_data(SQL_statement):# insert HQL Statement with the usual '''<QUERY>''' 
    pyodbc.autocommit = True
    #Connection settings- DSN can be replaced with STG or DEV as required, depending on where you want to connect.
    conn = pyodbc.connect("DSN=HDP_PROD", autocommit=True)
    cursor = conn.cursor()
    #V1.1-- Config settings to limit TEZ container size preventing out of memory error, query takes slightly longer to run. 
    cursor.execute("set hive.tez.container.size=8192")
    cursor.execute("set hive.auto.convert.join.noconditionaltask.size=6553")
    #cursor.execute("set hive.auto.convert.join=false")
    #Creates df from SQL/HQL statement
    df = pd.read_sql(SQL_statement,conn)
    #Returns df to memory 
    return df

HIVE = gethive('''SELECT *
                FROM sp_commercial.INTERACTIONS_LAST6M''')

If i add in a where condition to the select statement above the function errors out.

Therefore wondering how i can query hue/hadoop from python with a where condition?

0 Answers