I have a project code in Python Notebook and it ran all good when Spark was hosted in Bluemix.
We are running the following code to connect to Netezza (on premises) which worked fine in Bluemix.
VT = sqlContext.read.format('jdbc').options(url='jdbc:netezza://169.54.xxx.x:xxxx/BACC_PRD_ISCNZ_GAPNZ',user='XXXXXX', password='XXXXXXX', dbtable='GRACE.CDVT_LIVE_SPARK', driver='org.netezza.Driver').load()'
However, after migration to DatascienceExperience, we are getting the following error. I have established the secure gateway and its all working fine, but this code is not running. I think the issue is with the Netezza driver. If it is the case, is there a way we can explicitly import the class/driver so the above code can be executed. Please help how we can address the issue.
Error Message:
/usr/local/src/spark20master/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
65 s = e.java_exception.toString()
/usr/local/src/spark20master/spark/python/lib/py4j-0.10.3-src.zip /py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
317 raise Py4JJavaError(
318 "An error occurred while calling {0}{1} {2}.\n".
--> 319 format(target_id, ".", name), value)
320 else:
321 raise Py4JError(
Py4JJavaError: An error occurred while calling o212.load.
: java.lang.ClassNotFoundException: org.netezza.driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:607)
at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:844)
at java.lang.ClassLoader.loadClass(ClassLoader.java:823)
at java.lang.ClassLoader.loadClass(ClassLoader.java:803)
at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:38)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createC onnectionFactory$1.apply(JdbcUtils.scala:49)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createC onnectionFactory$1.apply(JdbcUtils.scala:49)
at scala.Option.foreach(Option.scala:257)
Notebooks in Bluemix and notebooks in DSX (Data Science Experience) currently use the same backend, so they have access to the same pre-installed drivers. Netezza isn't among them. As Chris Snow pointed out, users can install additional JARs and Python packages into their service instances.
You probably created a new service instance for DSX, and did not yet install the user JARs and packages that the old one had. It's a one-time setup, therefore easy to forget when you've been using the same instance for a while. Execute these commands in a Python notebook of the old instance on Bluemix to check for user-installed things:
Then install the missing things into your new instance on DSX.