How to pass a dataframe read from excel to another variable in spark-scala?

63 views Asked by At

I have a dataframe var cache :DataFrame = _. As an initial run i have given, cache = existingDF, the existingdf is read from an excel using crealytics.spark.excel. but in the subsequent run, the existingDF will get another updated excel file, it should be cache = cache.union(existingDF) But I seem to get only existingDF inside cache. In short whenever i call cache it seems to read the excel. How do i avoid this? This issue is not there while reading it as csv. (It was there when i used .persist on the csv read, but got fixed when i removed .persist More Simply:

var a = _
while(true){
    val b = spark.read.format("com.crealytics.spark.excel")...
    if (Option(a).isEmpty){
      a = b
    }
    else if a!=b
      a = b.union(a)
}

The variable a is always getting updated along with b, so it never becomes different from b. How do I avoid this?

0

There are 0 answers