Apache Hive Metastore - information_schema

479 views Asked by At

I'm pretty new to Hive. I'm wondering whether it is possible to have an information_schema database with the Hive metastore data? For instance, databases that I create the Hive metastore and are returned when executing "show database", can these also be added to the information_schema.schemata table? I've found that the Hive metastore information is kept inside the Hive database, can I change this to be in information_schema?

Some posts, like the following, allude that an information_schema is possible, however, I cannot find any good resources on how to achieve this.

https://issues.apache.org/jira/browse/HIVE-16941

*** Update *** We still have not yet decided on which version or vendor to go with. We are thinking of using the open source version, however, if a vendor offers this (and open source not), we will consider a vendor as this is an important issue for us.

1

There are 1 answers

0
Ayush Srivastava On

I've developed this information_schema script for hive_metastore. Feel free to use-

cls = []
spark.sql("Drop view if exists allColumns")

for db in spark.sql("show databases").collect():
    for table in spark.catalog.listTables(f"{db.databaseName}"):
        try:
            for column in spark.catalog.listColumns(table.name, table.database):
                cls.append([table.database,table.name, column.name, column.dataType])
                print(table," table loaded successfully")
        except:
            print("Error Loading The Table ",table," in Information_Schema")
            
spark.createDataFrame(cls, schema = ['databaseName','tableName','columnName', 
'columnDataType']).createOrReplaceTempView("allColumns")

spark.sql("""create or replace table default.information_schema as select * from allColumns""")