Unable to call sparkRSQL.init function

1.4k views Asked by At

I am new to Spark and was trying to run the example mentioned in SparkR page. With some effort, I was able to install sparkR into my machine and was able to run the basic wordcount example. However, when I try to run:

library(SparkR) #works fine - loads the package sc <- sparkR.init() #works fine sqlContext <- sparkRSQL.init(sc) #fails

It says, there is no package called ‘sparkRSQL’. As per documentation sparkRSQL.init is a function in sparkR package. Please let me know if I am missing anything here.

Thanks in advance.

1

There are 1 answers

0
Radhwane Chebaane On

I have already faced this problem when trying to test sparkR. There is a lack of documentation on this part. The problem is that "sparkRSQL" and "sparkRHive" are not included in the master branch, so you have to install sparkR package from the "sparkr-sql" branch using this command:

library(devtools)
install_github("amplab-extras/SparkR-pkg", ref="sparkr-sql", subdir="pkg")

There was a hint in Amplab website

DataFrame was introduced in Spark 1.3; the 1.3-compatible SparkR version can be found in the Github repo sparkr-sql branch, which includes a preliminary R API to work with DataFrames. To link SparkR against older versions of Spark, use the archives on this page or the master branch.