I have a very interesting, sticky problem with user accounts between Linux, Hive, and Spark...
We have a Spark application at work that must be able to be executed by multiple (Linux) user accounts. However, we need to have shared Hive user that "owns" all tables, otherwise one user could create a table that no other user can overwrite, meaning that our code breaks except for the user who first ran the code to drop/create the table.
Now, for modifying things by hand, I can use command line parameters in Hive Beeline to set my "Hive User" to something other than my Linux user:
/usr/lib/hive/bin/beeline -u jdbc:hive2://<our hive server>:10000 -n <hiveuserid> -d org.apache.hive.jdbc.HiveDriver --hiveconf mapreduce.job.queuename=<queuename>
However, I know of no such command line parameter to set the Hive ID for a Spark job:
@SPARK_HOME/bin/spark-submit -? <hiveuserid>
Using sudo here isn't an option, because for security reasons our company gave us a Hive user that has no corresponding Linux user, so we really need a HIVE user parameter passed to our application.
It appears that there should be something either in the spark-submit command (see https://spark.apache.org/docs/latest/configuration.html for command-line arguments and parameters for spark-submit), or something from WITHIN my Spark scala code, such as
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.sql._
import org.apache.spark.sql.hive.HiveContext
sc = new SparkContext(. . .)
hc = new HiveContext(sc)
hc.sql("set user as <hiveuserid>")
or maybe the Hive Context itself has some function to set the user?
hc.SetUser("<hiveuserid>")
Any ideas? We are unable to run this job as different Linux users until we can use the same Hive user
(P.S. Again, creating a new shared Linux user that matches a shared Hive user is not an option for us, as it is against a company security policy to have multiple people sharing a Linux account userid, and we aren't allowed to share a password, so our Linux sudoer account is different than our shared Hive user account -- don't ask me why, it's an IT thing :-)
Have you considered setting up group permissions for the Hive data? For example your directory could have the following permissions :
drwxrwxr-x - hive hadoop 0 2014-10-14 04:28 /user/hive/warehouse/test
Any user that is part of the
hadoop
group will have full read/write/execute permissions to that table.