I have a set of Excel format files which needs to be read from Spark(2.0.0) as and when an Excel file is loaded into a local directory. Scala version used here is 2.11.8.
I've tried using readstream
method of SparkSession, but I'm not able to read in a streaming way. I'm able to read Excel files statically as:
val df = spark.read.format("com.crealytics.spark.excel").option("sheetName", "Data").option("useHeader", "true").load("Sample.xlsx")
Is there any other way of reading excel files in streaming way from a local directory?
Any answers would be helpful.
Thanks
Changes done:
val spark = SparkSession.builder().master("local[*]").config("spark.sql.warehouse.dir","file:///D:/pooja").appName("Spark SQL Example").getOrCreate()
spark.conf.set("spark.sql.streaming.schemaInference", true)
import spark.implicits._
val dataFrame = spark.readStream.format("csv").option("inferSchema",true).option("header", true).load("file:///D:/pooja/sample.csv")
dataFrame.writeStream.format("console").start()
dataFrame.show()
Updated code:
val spark = SparkSession.builder().master("local[*]").appName("Spark SQL Example").getOrCreate()
spark.conf.set("spark.sql.streaming.schemaInference", true)
import spark.implicits._
val df = spark.readStream.format("com.crealytics.spark.excel").option("header", true).load("file:///filepath/*.xlsx")
df.writeStream.format("memory").queryName("tab").start().awaitTermination()
val res = spark.sql("select * from tab")
res.show()
Error:
Exception in thread "main" java.lang.UnsupportedOperationException: Data source com.crealytics.spark.excel does not support streamed reading
Can anyone help me resolve this issue.
For a streaming DataFrame you have to provide Schema and currently, DataStreamReader does not support
option("inferSchema", true|false)
. You can set SQLConf settingspark.sql.streaming.schemaInference
, which needs to be set at session level.You can refer here