com.cloudant.spark data source not found in DSX Notebook

Question

com.cloudant.spark data source not found in DSX Notebook

250 views Asked by Wieland At 08 July 2017 at 15:48

I'm trying to follow https://developer.ibm.com/clouddataservices/docs/ibm-data-science-experience/docs/load-and-filter-cloudant-data-with-spark/ to load cloudant data with Spark. I have a Scala 2.11 with Spark 2.1 (happens with Spark 2.0 as well) notebook with the following code in it:

// @hidden_cell
var credentials = scala.collection.mutable.HashMap[String, String](
  "username"->"<redacted>",
  "password"->"""<redacted>""",
  "host"->"<redacted>",
  "port"->"443",
  "url"->"<redacted>"
)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val cloudantdata = sqlContext.read.format("com.cloudant.spark").
option("cloudant.host", credentials("host")).
option("cloudant.username", credentials("username")).
option("cloudant.password", credentials("password")).
load("crimes")

Trying to execute that cell only ends in

Name: java.lang.ClassNotFoundException Message: Failed to find data source: com.cloudant.spark. Please find packages at http://spark.apache.org/third-party-projects.html StackTrace: at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:569) at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135) ... 42 elided Caused by: java.lang.ClassNotFoundException: com.cloudant.spark.DefaultSource at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62) at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:844) at java.lang.ClassLoader.loadClass(ClassLoader.java:823) at java.lang.ClassLoader.loadClass(ClassLoader.java:803) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:554) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:554) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:554) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:554) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:554)

How can I get past this error and connect to my Cloudant database?

Original Q&A

There are 1 answers

**charles gomes** · Accepted Answer · 2017-07-10T20:09:12+00:00

There must have been some issue that caused cloudant driver to be missing which is typically by default present in DSX Notebook. Please change to python 2.0 and spark 2.1 kernel and run this one time installation(per spark service) of cloudant connector so that it will be available for all your spark 2.0+ kernels.

!pip install --upgrade pixiedust

import pixiedust

pixiedust.installPackage("cloudant-labs:spark-cloudant:2.0.0-s_2.11")

Restart kernel once.

Then change the kernel to your scala kernel and then run your cloudant connection code.

Thanks, Charles.

TechQA.

com.cloudant.spark data source not found in DSX Notebook

There are 1 answers

Related Questions in DATA-SCIENCE-EXPERIENCE

Related Questions in SPARK-CLOUDANT

Popular Questions

Popular Tags

Trending Questions