Hoping this can be solved. At the moment this works:
import pyodbc, sys, os
import pandas as pd**
def get_data(SQL_statement):# insert HQL Statement with the usual '''<QUERY>'''
pyodbc.autocommit = True
#Connection settings- DSN can be replaced with STG or DEV as required, depending on where you want to connect.
conn = pyodbc.connect("DSN=HDP_PROD", autocommit=True)
cursor = conn.cursor()
#V1.1-- Config settings to limit TEZ container size preventing out of memory error, query takes slightly longer to run.
cursor.execute("set hive.tez.container.size=8192")
cursor.execute("set hive.auto.convert.join.noconditionaltask.size=6553")
#cursor.execute("set hive.auto.convert.join=false")
cursor.execute(SQL_statement)
#Creates df from SQL/HQL statement
df = pd.read_sql(SQL_statement,conn)
#Returns df to memory
return df
HIVE = gethive('''SELECT *
FROM sp_commercial.INTERACTIONS_LAST6M''')
If i add in a where condition to the select statement above the function errors out.
Therefore wondering how i can query hue/hadoop from python with a where condition?