I used ./bin/spark-shell to run some experiments and find out the following facts. When running jobs (transformation + action), I notice the memory usage in the top. For example, for 5G text file, I did a simple filter() and count(). After the job is done, there are 7g marked as res in the top window. I have 100g memory and set the executor memory to be 50g.
Is there anyone know what that 7g is?
Since Spark runs on the JVM, just because Spark may no longer have any references to some memory doesn't mean that that memory will be freed. Even if a garbage collection has been triggered, the JVM may not release the memory back to the OS (this is controlled by
-XX:MaxHeapFreeRatio
among other things).