I am using Pyspark to connect to HIVE and fetch some data. The issue is that it returns all rows with the values that are column names. It is returning correct column names. Only the Row values are incorrect.
Here is my Code
hive_jar_path="C:Users/shakir/Downloads/ClouderaHiveJDBC-2.6.11.1014/ClouderaHiveJDBC-2.6.11.1014/ClouderaHiveJDBC42-2.6.11.1014/HiveJDBC42.jar"
print(hive_jar_path)
print("")
import os
os.environ["HADOOP_HOME"]="c:/users/shakir/downloads/spark/spark/spark"
import os
os.environ["SPARK_HOME"]="c:/users/shakir/downloads/spark/spark/spark"
import findspark
findspark.init()
from pyspark import SparkContext, SparkConf, SQLContext
from pyspark.sql import SparkSession
import uuid
spark = SparkSession \
.builder \
.appName("Python Spark SQL Hive integration example") \
.config("spark.sql.warehouse.dir", "hdfs://...../user/hive/warehouse/..../....")
spark.config("spark.driver.extraClassPath", hive_jar_path)
spark.config("spark.sql.hive.llap", "true")
spark.config("spark.sql.warehouse.dir", "hdfs://...../user/hive/warehouse/..../....")
spark=spark.enableHiveSupport().getOrCreate()
import databricks.koalas as ks
print("Reading Data from Hive . . .")
options={
"fetchsize":1000,
"inferSchema": True,
"fileFormat":"orc",
"inputFormat":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat",
"outputFormat":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat",
"driver":"org.apache.hive.jdbc.HiveDriver",
}
df = ks.read_sql("SELECT * FROM PERSONS LIMIT 3", connection_string,options=options)
print("Done")
print(df)
Output of the code:
+------+-----+---------+
| Name | Age | Address |
+------+-----+---------+
| Name | Age | Address |
+------+-----+---------+
| Name | Age | Address |
+------+-----+---------+
| Name | Age | Address |
+------+-----+---------+
my situation was not the same as yours, as I was connecting to
Spark
thrujdbc
however I was getting the same results as you did: column names returned as literals.I fixed it by overriding the
JdbcDialect
. This answer has a nice explanation of why and some useful links as well 1, 2.Just pay attention to the quoting characters, medium has this weird typography for quoting characters.
Hope it helps after this long. Leaving it here for when I forget about it and google it again :p