I'm currently working on Apache Spark with Hive Metastore. I've created a table in Hive Metastore using user A. However, when running a DROP query in Spark, it appears to be using user B, and surprisingly, I'm able to drop the table even though I haven't granted any privileges to user B. This seems like a security concern. How can I configure Hive Metastore or Spark to ensure that the owner's privileges defined in Hive Metastore are respected for access control?
hive meta store tables are given below
TBL_ID|CREATE_TIME|DB_ID|LAST_ACCESS_TIME|OWNER|OWNER_TYPE|RETENTION|SD_ID|TBL_NAME|TBL_TYPE |VIEW_EXPANDED_TEXT|VIEW_ORIGINAL_TEXT|IS_REWRITE_ENABLED|
------+-----------+-----+----------------+-----+----------+---------+-----+--------+-------------+------------------+------------------+------------------+
2| 1701160550| 1| 0|A |USER | 0| 2|orders |MANAGED_TABLE| | |false |
Table privileges
TBL_GRANT_ID|CREATE_TIME|GRANT_OPTION|GRANTOR|GRANTOR_TYPE|PRINCIPAL_NAME|PRINCIPAL_TYPE|TBL_PRIV|TBL_ID|
------------+-----------+------------+-------+------------+--------------+--------------+--------+------+
5| 1701160550| 1|A |USER |A |USER |select | 2|
6| 1701160550| 1|A |USER |A |USER |insert | 2|
7| 1701160550| 1|A |USER |A |USER |update | 2|
8| 1701160550| 1|A |USER |A |USER |delete | 2|
Spark code
from pyspark.sql import SparkSession
from pyspark.conf import SparkConf
# Create a SparkSession
spark_conf = SparkConf().setAppName("spark_hms")
spark_conf.set("hive.metastore.uris", "thrift://localhost:9083")
spark_conf.set("spark.hadoop.fs.s3a.path.style.access", "true")
spark_conf.set("spark.sql.catalogImplementation","hive")
spark_conf.set("spark.hadoop.fs.s3a.access.key", ""),
spark_conf.set("spark.hadoop.fs.s3a.secret.key", ""),
spark_conf.set("spark.hadoop.fs.s3a.endpoint", ""),
spark_conf.set("spark.hadoop.fs.s3a.region", "us-east-1"),
spark_conf.set("hive.security.authenticator.manager", "org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator")
spark_conf.set("hive.security.authorization.enabled", "true")
spark_conf.set("hive.security.authorization.manager", "org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory")
spark_conf.set("spark.jars", ""),
spark = SparkSession.builder.config(conf=spark_conf).enableHiveSupport().getOrCreate()
print(spark.sparkContext.getConf().getAll())
spark.sql("DROP table default.orders_trino_R")
Hive site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.server.min-threads</name>
<value>10</value>
</property>
<property>
<name>hive.metastore.server.max-threads</name>
<value>100</value>
</property>
<property>
<name>hive.metastore.client.socket.timeout</name>
<value>1800</value>
</property>
<property>
<name>hive.metastore.pool.maximumPoolSize</name>
<value>50</value>
</property>
<property>
<name>metastore.thrift.uris</name>
<value>thrift://localhost:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>metastore.task.threads.always</name>
<value>org.apache.hadoop.hive.metastore.events.EventCleanerTask,org.apache.hadoop.hive.metastore.MaterializationsCacheCleanerTask</value>
</property>
<property>
<name>metastore.expression.proxy</name>
<value>org.apache.hadoop.hive.metastore.DefaultPartitionExpressionProxy</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://db:5432/mydatabase</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>admin</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>admin</value>
</property>
<property>
<name>hive.vectorized.execution.enabled</name>
<value>false</value>
</property>
<property>
<name>spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation</name>
<value>true</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>s3a://datalake//</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>s3a://datalake/</value>
</property>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>pQ339uhR6wrublDY1Dqn</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>HrqHchDVytpcVdxNMS7KnQ5haJF4X0NsJIBk138o</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>min:443</value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
</property>
<property>
<name>hive.support.concurrency</name>
<value>false</value>
</property>
<property>
<name>hive.txn.timeout</name>
<value>300</value>
</property>
<property>
<name>iceberg.experimental.extended-statistics.enabled</name>
<value>true</value>
</property>
<property>
<name>hive.security.authorization.enabled</name>
<value>true</value>
</property>
</configuration>
I'm interested in understanding how to configure Hive Metastore or Spark to respect owner ACL privileges in Spark SQL.