I'm trying to submit a java spark program with the package crealytics.spark-excel
.
The java file looks like this:
package org.example;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import com.crealytics.spark.excel.*;
public class Main {
public static void main(String[] args) {
SparkSession spark = SparkSession.builder()
.appName("Spark_Excel")
.master("local")
.getOrCreate();
String filePath = "/home/arman/Desktop/resources/example_XLSX_5000.xlsx";
Dataset<Row> dataset = spark.read().format("com.crealytics.spark.excel")
.option("header", "true")
.load(filePath);
dataset.show();
dataset.write()
.format("excel") // Or .format("excel") for V2 implementation
.option("header", "true")
.partitionBy("Country")
.save("/home/arman/Desktop/output/partitioned");
spark.stop();
}
}
The pom.xml looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>spark-excel</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<name>spark-excel</name>
<url>http://maven.apache.org</url>
<dependencies>
<!-- Apache Spark Core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.4.1</version>
</dependency>
<!-- Apache Spark SQL -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.4.1</version>
</dependency>
<!-- Spark Excel Library -->
<dependency>
<groupId>com.crealytics</groupId>
<artifactId>spark-excel_2.12</artifactId>
<version>3.4.1_0.19.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<!-- Build an executable JAR -->
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>3.1.0</version>
<configuration>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<classpathPrefix>lib/</classpathPrefix>
<mainClass>org.example.Main</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>
It's working correctly, if I run the java class from intellij idea
but using spark-submit --class org.example.Main spark-excel-1.0-SNAPSHOT.jar
I become the following error:
Exception in thread "main" org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: com.crealytics.spark.excel. Please find packages at `https://spark.apache.org/third-party-projects.html`.
I have tried with both formats "com.crealytics.spark.excel" and "excel" and tried with different versions of the dependencies
spark-excel_2.12
is not included in the driver classpath.You can either:
--packages com.crealytics:spark-excel_2.12:3.4.1_0.19.0
in your spark-submit command, or by passing the configurationspark.jars.packages
set to the maven coordinates of the dependency.