I am currently using Rstudio server hosted outside databricks cluster and followed the steps to configure Databricks Connect. The connection test was also successful . But when I initialize a spark session using the below code, it throws up an error.

>>library(SparkR)
>>sparkR.session()

I have even tried the command below but throws up the same error:

SparkR::sparkR.session()

WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Error in if (len > 0) { : argument is of length zero

I want to access SQL tables on databricks or write an sql query in R like below:

diamonds <- sql("select * from default.diamonds")

for which initializing spark session is required. Let me know if any other alternative can be applied.

1

There are 1 answers

0
raizsh On

I had a similar problem. I made the following changes to my code.

library(SparkR, lib.loc = "/usr/local/spark/R/lib")
sparkEnvir <- list(spark.num.executors='5', spark.executor.cores='5')
# initializing Spark context
sc <- sparkR.init(sparkHome = "/usr/local/spark",
                  sparkEnvir = sparkEnvir)
# initializing SQL context
sqlContext <- sparkRSQL.init(sc)