I'm trying to follow https://developer.ibm.com/clouddataservices/docs/ibm-data-science-experience/docs/load-and-filter-cloudant-data-with-spark/ to load cloudant data with Spark. I have a Scala 2.11 with Spark 2.1 (happens with Spark 2.0 as well) notebook with the following code in it:
// @hidden_cell
var credentials = scala.collection.mutable.HashMap[String, String](
"username"->"<redacted>",
"password"->"""<redacted>""",
"host"->"<redacted>",
"port"->"443",
"url"->"<redacted>"
)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val cloudantdata = sqlContext.read.format("com.cloudant.spark").
option("cloudant.host", credentials("host")).
option("cloudant.username", credentials("username")).
option("cloudant.password", credentials("password")).
load("crimes")
Trying to execute that cell only ends in
Name: java.lang.ClassNotFoundException Message: Failed to find data source: com.cloudant.spark. Please find packages at http://spark.apache.org/third-party-projects.html StackTrace: at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:569) at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135) ... 42 elided Caused by: java.lang.ClassNotFoundException: com.cloudant.spark.DefaultSource at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62) at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:844) at java.lang.ClassLoader.loadClass(ClassLoader.java:823) at java.lang.ClassLoader.loadClass(ClassLoader.java:803) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:554) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:554) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:554) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:554) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:554)
How can I get past this error and connect to my Cloudant database?
There must have been some issue that caused cloudant driver to be missing which is typically by default present in DSX Notebook. Please change to python 2.0 and spark 2.1 kernel and run this one time installation(per spark service) of cloudant connector so that it will be available for all your spark 2.0+ kernels.
import pixiedust
Restart kernel once.
Then change the kernel to your scala kernel and then run your cloudant connection code.
Thanks, Charles.