Netezza Drivers not available in Spark (Python Notebook) in DataScienceExperience

Question

Netezza Drivers not available in Spark (Python Notebook) in DataScienceExperience

1.1k views Asked by Sagar KSK At 20 January 2017 at 15:00

I have a project code in Python Notebook and it ran all good when Spark was hosted in Bluemix.

We are running the following code to connect to Netezza (on premises) which worked fine in Bluemix.

VT =  sqlContext.read.format('jdbc').options(url='jdbc:netezza://169.54.xxx.x:xxxx/BACC_PRD_ISCNZ_GAPNZ',user='XXXXXX', password='XXXXXXX', dbtable='GRACE.CDVT_LIVE_SPARK', driver='org.netezza.Driver').load()'

However, after migration to DatascienceExperience, we are getting the following error. I have established the secure gateway and its all working fine, but this code is not running. I think the issue is with the Netezza driver. If it is the case, is there a way we can explicitly import the class/driver so the above code can be executed. Please help how we can address the issue.

Error Message:


/usr/local/src/spark20master/spark/python/pyspark/sql/utils.py in  deco(*a, **kw)
61     def deco(*a, **kw):
62         try:
---> 63             return f(*a, **kw)
64         except py4j.protocol.Py4JJavaError as e:
65             s = e.java_exception.toString()

/usr/local/src/spark20master/spark/python/lib/py4j-0.10.3-src.zip /py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
317                 raise Py4JJavaError(
318                     "An error occurred while calling {0}{1} {2}.\n".
--> 319                     format(target_id, ".", name), value)
320             else:
321                 raise Py4JError(

Py4JJavaError: An error occurred while calling o212.load.
: java.lang.ClassNotFoundException: org.netezza.driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:607)
at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:844)
at java.lang.ClassLoader.loadClass(ClassLoader.java:823)
at java.lang.ClassLoader.loadClass(ClassLoader.java:803)
at  org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:38)
at    org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createC    onnectionFactory$1.apply(JdbcUtils.scala:49)
at  org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createC    onnectionFactory$1.apply(JdbcUtils.scala:49)
at scala.Option.foreach(Option.scala:257)

Original Q&A

There are 3 answers

**Roland Weber** · Answer 1 · 2017-01-23T08:16:58+00:00

Notebooks in Bluemix and notebooks in DSX (Data Science Experience) currently use the same backend, so they have access to the same pre-installed drivers. Netezza isn't among them. As Chris Snow pointed out, users can install additional JARs and Python packages into their service instances.

You probably created a new service instance for DSX, and did not yet install the user JARs and packages that the old one had. It's a one-time setup, therefore easy to forget when you've been using the same instance for a while. Execute these commands in a Python notebook of the old instance on Bluemix to check for user-installed things:

!ls -lF ~/data/libs
!pip freeze

Then install the missing things into your new instance on DSX.

**Chris Snow** · Answer 2 · 2017-01-21T00:05:32+00:00

You can install a jar file by adding a cell with an exclamation mark that runs a unix tool to download the file, in this example wget:

!wget https://some.public.host/yourfile.jar -P  ${HOME}/data/libs

After downloading the file you will need to restart your kernel.

Note this approach assumes your jar file is publicly available on the Internet.

**charles gomes** · Answer 3 · 2017-04-18T21:00:25+00:00

There is another way to connect to Netezza using ingest connector which is by default enabled in DSX.

http://datascience.ibm.com/docs/content/analyze-data/python_load.html

from ingest import Connectors

from pyspark.sql import SQLContext

sqlContext = SQLContext(sc)

NetezzaloadOptions = { 
                 Connectors.Netezza.HOST              : 'hostorip',
                 Connectors.Netezza.PORT              : 'port',
                 Connectors.Netezza.DATABASE          : 'databasename',
                 Connectors.Netezza.USERNAME          : 'xxxxx',
                 Connectors.Netezza.PASSWORD          : 'xxxx',
                 Connectors.Netezza.SOURCE_TABLE_NAME         : 'tablename'}

NetezzaDF = sqlContext.read.format("com.ibm.spark.discover").options(**NetezzaloadOptions).load()

NetezzaDF.printSchema()

NetezzaDF.show()

Thanks,

Charles.

TechQA.

Netezza Drivers not available in Spark (Python Notebook) in DataScienceExperience

There are 3 answers

Related Questions in APACHE-SPARK

Related Questions in NETEZZA

Related Questions in DATA-SCIENCE-EXPERIENCE

Related Questions in DSX

Popular Questions

Popular Tags

Trending Questions