Spark Java code that works fine in Eclipse IDE throws ClassNotFoundException while running the jar generated by maven

56 views Asked by At

I'm using below Java Spark code to connect to NATS.

SparkSession spark = SparkSession.builder()
                    .appName("spark-with-nats")
                    .master("local")
                    .config("spark.jars",
                      "libs/nats-spark-connector-balanced_2.12-1.1.4.jar,"+"libs/jnats-2.17.1.jar")
                    .config("spark.sql.streaming.checkpointLocation","tmp/checkpoint")
                    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
                    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
                      .getOrCreate();
            
            Dataset<Row> df = spark.readStream()
                    .format("nats")
                    .option("nats.host", "localhost")
                    .option("nats.port", 4222)
                    .option("nats.stream.name", "newstream")
                    .option("nats.stream.subjects", "newsub")
                    .option("nats.durable.name", "cons1")
                    .option("nats.msg.ack.wait.secs", 120)
                    .load();

2 External jars that I'm using while creating sparkSession are present under "libs" folder and have been added to the classpath

.config("spark.jars","libs/nats-spark-connector-balanced_2.12-1.1.4.jar,"+"libs/jnats-2.17.1.jar")

enter image description here

enter image description here

This code works fine when I run from Eclipse IDE. Now I'm building a jar out of it using maven pom.xml :

<dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-aws</artifactId>
            <version>3.3.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.3.2</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/io.delta/delta-core -->
        <dependency>
            <groupId>io.delta</groupId>
            <artifactId>delta-core_2.12</artifactId>
            <version>2.3.0</version>
        </dependency>
        
        <!-- *** COMMENT START **********

    <dependency>
        <groupId>external.group</groupId>
        <artifactId>nats-spark-connector-balanced_2.12</artifactId>
        <version>1.1.4</version>
        <scope>system</scope>
        <systemPath>${project.basedir}/libs/nats-spark-connector-balanced_2.12-1.1.4.jar</systemPath>
    </dependency>
    <dependency>
        <groupId>external.group</groupId>
        <artifactId>jnats</artifactId>
        <version>2.17.1</version>
        <scope>system</scope>
        <systemPath>${project.basedir}/libs/jnats-2.17.1.jar</systemPath>
    </dependency> 

    *** COMMENT END ********** -->
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <version>2.5.7</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>repackage</goal>
                        </goals>
                        <configuration>
                            <mainClass>com.optiva.MinIOTester</mainClass>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

When I run the generated jar by providing the libs folder ( with 2 external jars) in the classpath

java -cp "../libs/*.jar" -jar spark-learning-0.0.1-SNAPSHOT.jar

I'm getting below error :

Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
        at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
        at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
        at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88)
Caused by: java.lang.ClassNotFoundException:
Failed to find data source: nats. Please find packages at
https://spark.apache.org/third-party-projects.html

        at org.apache.spark.sql.errors.QueryExecutionErrors$.failedToFindDataSourceError(QueryExecutionErrors.scala:587)
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:675)
        at org.apache.spark.sql.streaming.DataStreamReader.loadInternal(DataStreamReader.scala:157)
        at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:144)
        at com.test.MinIOTester.sparkNatsTesterNewOnLocal(MinIOTester.java:387)
        at com.test.MinIOTester.main(MinIOTester.java:31)
        ... 8 more
Caused by: java.lang.ClassNotFoundException: nats.DefaultSource
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at org.springframework.boot.loader.LaunchedURLClassLoader.loadClass(LaunchedURLClassLoader.java:151)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:661)
        at scala.util.Try$.apply(Try.scala:213)
        at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:661)
        at scala.util.Failure.orElse(Try.scala:224)
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:661)

If I see the error Caused by: java.lang.ClassNotFoundException: nats.DefaultSource looks like it's not making use of the 2 external jars that we are adding to the classpath while running java -cp "../libs/*.jar" command . I tried with giving absolute path for the external jar folder and even jar names. But still getting same error. What am I missing?

1

There are 1 answers

0
VGH On

It ran successfully by using spark-submit command by passing external dependencies in the classpath

spark-submit --jars libs/nats-spark-connector-balanced_2.12-1.1.4.jar,libs/jnats-2.17.1.jar spark-learning-0.0.1-SNAPSHOT.jar 

Thanks @JoachimSauer for the hint that "-cp is ignored when using -jar, as only the classpath specified in the jar file will be used" . In my earlier command as I was using java -jar, -cp was getting ignored.