from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf
from pyspark.storagelevel import StorageLevel
spark = SparkSession.builder.appName('TEST').config('spark.ui.port','4098').enableHiveSupport().getOrCreate()
df4 = spark.sql('
select * from hive_schema.table_name limit 1') print("query completed " )
df4.unpersist()
df4.count()
df4.show()
I have executed above code to clear the dataframe and release the memory. However, df4.show() still works and shows the data.
Could you please help me with right method to free memory occupied by a spark DF please ?
the function
unpersist()
would just let Spark know that it can remove the data if it wants to rather than a hard push to clean up the data. Just updating the function to pass Boolean True would let Spark know that it must remove the data from the Cache before proceeding.Reference : https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.unpersist.html