sparkR 1.4.0 : how to include jars

Question

sparkR 1.4.0 : how to include jars

711 views Asked by user2456338 At 25 June 2015 at 16:58

I'm trying to hook SparkR 1.4.0 up to Elasticsearch using the elasticsearch-hadoop-2.1.0.rc1.jar jar file (found here). It's requiring a bit of hacking together, calling the SparkR:::callJMethod function. I need to get a jobj R object for a couple of Java classes. For some of the classes, this works:

SparkR:::callJStatic('java.lang.Class', 
                     'forName', 
                     'org.apache.hadoop.io.NullWritable')

But for others, it does not:

SparkR:::callJStatic('java.lang.Class', 
                     'forName', 
                     'org.elasticsearch.hadoop.mr.LinkedMapWritable')

Yielding the error:

java.lang.ClassNotFoundException:org.elasticsearch.hadoop.mr.EsInputFormat

It seems like Java isn't finding the org.elasticsearch.* classes, even though I've tried including them with the command line --jars argument, and the sparkR.init(sparkJars = ...) function.

Any help would be greatly appreciated. Also, if this is a question that more appropriately belongs on the actual SparkR issue tracker, could someone please point me to it? I looked and was not able to find it. Also, if someone knows an alternative way to hook SparkR up to Elasticsearch, I'd be happy to hear that as well.

Thanks! Ben

Original Q&A

There are 1 answers

**Alex Ioannides** · Answer 1 · 2015-08-07T16:07:06+00:00

Here's how I've achieved it:

# environments, packages, etc ----
Sys.setenv(SPARK_HOME = "/applications/spark-1.4.1")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))

library(SparkR)

# connecting Elasticsearch to Spark via ES-Hadoop-2.1 ----
spark_context <- sparkR.init(master = "local[2]", sparkPackages = "org.elasticsearch:elasticsearch-spark_2.10:2.1.0")
spark_sql_context <- sparkRSQL.init(spark_context)
spark_es <- read.df(spark_sql_context, path = "index/type", source = "org.elasticsearch.spark.sql")
printSchema(spark_es)

(Spark 1.4.1, Elasticsearch 1.5.1, ES-Hadoop 2.1 on OS X Yosemite)

The key idea is to link to the ES-Hadoop package and not the jar file, and to use it to create a Spark SQL context directly.

TechQA.

sparkR 1.4.0 : how to include jars

There are 1 answers

Related Questions in ELASTICSEARCH

Related Questions in APACHE-SPARK

Related Questions in SPARKR

Related Questions in ELASTICSEARCH-HADOOP

Popular Questions

Popular Tags

Trending Questions