I could able to read the Excel file data using Crealytics library spark-excel_2.12-3.4.1_0.19.0 but was not able to execute the same code by using the latest version spark-excel_2.12-3.5.0_0.20.1.
I tried the below code but none of the code works with latest library spark-excel_2.12-3.5.0_0.20.1.
df=spark.read.format("com.crealytics.spark.excel").option("inferschema",True).option("header", True).option("ignoreLeadingWhiteSpace", "true").option("ignoreTrailingWhiteSpace", "true").option("keepUndefinedRows", True).option("dataAddress","'SHEET_NAME'!").load(path)
df=spark.read.format("excel").option("inferschema",True).option("header", True).option("ignoreLeadingWhiteSpace", "true").option("ignoreTrailingWhiteSpace", "true").option("keepUndefinedRows", True).option("dataAddress","'SHEET_NAME'!").load(path)
It is throwing follwoing error when used latest Crealytics
Py4JJavaError: An error occurred while calling o412.load.
: java.lang.ClassNotFoundException:
Failed to find data source: excel. Please find packages at
https://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedToFindDataSourceError(QueryExecutionErrors.scala:837)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:744)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:794)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:328)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:237)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: excel.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:730)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:730)
at scala.util.Failure.orElse(Try.scala:224)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:730)
... 15 more
Note that,Executing the above code in Databrciks Python cell with scala version 2.x
My question here is why it is not working when upgraded to latest version of crealytics?
Sorry for wasting your time here. I found that there is an issue with library and it was reported. Here is the link for the same
https://github.com/crealytics/spark-excel/issues/789