com.cloudant.spark data source not found in DSX Notebook

227 views Asked by At

I'm trying to follow https://developer.ibm.com/clouddataservices/docs/ibm-data-science-experience/docs/load-and-filter-cloudant-data-with-spark/ to load cloudant data with Spark. I have a Scala 2.11 with Spark 2.1 (happens with Spark 2.0 as well) notebook with the following code in it:

// @hidden_cell
var credentials = scala.collection.mutable.HashMap[String, String](
  "username"->"<redacted>",
  "password"->"""<redacted>""",
  "host"->"<redacted>",
  "port"->"443",
  "url"->"<redacted>"
)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val cloudantdata = sqlContext.read.format("com.cloudant.spark").
option("cloudant.host", credentials("host")).
option("cloudant.username", credentials("username")).
option("cloudant.password", credentials("password")).
load("crimes")

Trying to execute that cell only ends in

Name: java.lang.ClassNotFoundException Message: Failed to find data source: com.cloudant.spark. Please find packages at http://spark.apache.org/third-party-projects.html StackTrace: at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:569) at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135) ... 42 elided Caused by: java.lang.ClassNotFoundException: com.cloudant.spark.DefaultSource at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62) at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:844) at java.lang.ClassLoader.loadClass(ClassLoader.java:823) at java.lang.ClassLoader.loadClass(ClassLoader.java:803) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:554) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:554) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:554) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:554) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:554)

How can I get past this error and connect to my Cloudant database?

1

There are 1 answers

0
charles gomes On BEST ANSWER

There must have been some issue that caused cloudant driver to be missing which is typically by default present in DSX Notebook. Please change to python 2.0 and spark 2.1 kernel and run this one time installation(per spark service) of cloudant connector so that it will be available for all your spark 2.0+ kernels.

!pip install --upgrade pixiedust

import pixiedust

pixiedust.installPackage("cloudant-labs:spark-cloudant:2.0.0-s_2.11")

Restart kernel once.

Then change the kernel to your scala kernel and then run your cloudant connection code.

Thanks, Charles.