Can SparkSession.catalog.clearCache() delete data from hdfs?

557 views Asked by At

I am experiencing some data deletion issue since we have migrated from CDH to HDP (spark 2.2 to 2.3). The tables are being read from an hdfs location and after a certain time running spark job that reads and processes those tables, it throws table not found exception and when we check that location all the records are vanished. In my spark(Java) code I see before that table is read, clearCache() is called. Can it delete those files? If yes, how do I fix it?

1

There are 1 answers

0
Som On

I think, you should look at the source code - Spark has their own implementation of caching user data and they never delete the same while managing this cache via CacheManager. Have alook