I am unable to execute the below command from pyspark windows
schemaPeople = spark.createDataFrame(people)
I have set HADOOP_HOME to winutils I have provide 77 permission to C:/tmp/hive
Still I am getting the below error -
Py4JJavaError: An error occurred while calling o23.applySchemaToPythonRDD.
: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:189)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
I have gone through a lot of similar questions before posting this , appreciate any help here
I got this error a bunch when trying to setup Spark on windows using the winutils file. I had to setup Spark differently to get around this.
I ended up downloading the Hadoop binary for my version of spark and going from there. I documented the whole thing with a walkthrough if you are interested. Spark on windows
The gist is that the official Hadoop release from Apache does not include a Windows binary and compiling from sources can be tedious so really helpful people have made compiled distributions available. If you want to use Spark 2.0.2 download the binaries from steve loughran's github for 2.1.0 you can download from here from there you should be able to set it up as expected.