I have a dataframe var cache :DataFrame = _
. As an initial run i have given, cache = existingDF
, the existingdf is read from an excel using crealytics.spark.excel.
but in the subsequent run, the existingDF will get another updated excel file, it should be cache = cache.union(existingDF)
But I seem to get only existingDF inside cache. In short whenever i call cache it seems to read the excel. How do i avoid this? This issue is not there while reading it as csv. (It was there when i used .persist
on the csv read, but got fixed when i removed .persist
More Simply:
var a = _
while(true){
val b = spark.read.format("com.crealytics.spark.excel")...
if (Option(a).isEmpty){
a = b
}
else if a!=b
a = b.union(a)
}
The variable a is always getting updated along with b, so it never becomes different from b. How do I avoid this?