Python: Clear pyspark dataframe

90 views Asked by At
from pyspark.sql import SparkSession 
from pyspark import SparkContext, SparkConf 
from pyspark.storagelevel import StorageLevel 
spark = SparkSession.builder.appName('TEST').config('spark.ui.port','4098').enableHiveSupport().getOrCreate()

df4 = spark.sql('
select * from hive_schema.table_name limit 1') print("query completed " )
 
df4.unpersist() 

df4.count()

df4.show()

I have executed above code to clear the dataframe and release the memory. However, df4.show() still works and shows the data.

Could you please help me with right method to free memory occupied by a spark DF please ?

1

There are 1 answers

2
Anand Vidvat On

the function unpersist() would just let Spark know that it can remove the data if it wants to rather than a hard push to clean up the data. Just updating the function to pass Boolean True would let Spark know that it must remove the data from the Cache before proceeding.

df4.unpersist(True) 

Reference : https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.unpersist.html