In databricks, I need to use a company package to interact with data. This package is implemented in the following way:
self.dataframe = (
spark.read.format(self.format)
.options(**self.options.to_dict())
.load(self.data_path)
)
Therefore I am looking for a way to read a hive table using this syntax.
I tried the following calls:
spark.read.format("hive").load("hive_metastore.default.my_table")
spark.read.format("hive").load("default.my_table")
spark.read.format("hive").load("my_table")
spark.read.format("hive").load("/user/hive/warehouse/my_table")
but every attempt returns: AnalysisException: Hive data source can only be used with tables, you can not read files of Hive data source directly.
So format("hive")
seems to be acceptable (although I cannot find any reference online), but the right input remains a mystery to me...
(Note that I am aware of the spark.read.table
call, which works fine by the way )