spark - How is it even possible to get an OOM?

37 views Asked by At

If Spark is designed to "spill-to-disk" if the there is not enough memory then I am wondering how is it even possible to get an OOM in Spark ?

There is a spark.local.dir which is used as a scratchpad for all operations that do not fit in memory.

If that is the case then how and why is it even possible to get an OOM in Spark ? Everything should spill to disk and be fine right ?

Example Scenarios:

  1. A Spark executor needs to transfer data to another executor ie. a shuffle, which might cause a OOM.

Naive Solution - As the recipient executor is receiving the data it starts writing to disk if it starts to not fit in RAM.

  1. A coalesce(1) is triggered which triggers a shuffle and the data does not fit in memory.

Naive Solution - The executor gathers all data and as soon as it sees data becoming too large for its own RAM it starts writing to disk.

  1. A driver node does a collect() on a very large dataset and cannot fit in the data into memory.

Naive Solution - The driver keeps receiving data and keeps it in RAM for as long as it can, after that, yeah you got it, spill to disk.

I could not think of a single instance where you can land up with an OOM, provided Spark can spill-to-disk.

But obviously unfortunately in life OOMs do happen, so what am I missing here ?

1

There are 1 answers

6
Lingesh.K On

Very interesting question indeed, let me break down your observations for the example scenarios you have posted:

There is a spark.local.dir which is used as a scratchpad for all operations that do not fit in memory..Everything should spill to disk and be fine right ?

True, but spilling does not happen in real-time as fast as the data is in being read/written in memory meaning Spark can still be in the process of spilling to disk as there is even more data in memory, hence an OOM error is inevitable.

Spilling is not a one-way process, once the data is spilled Spark may need to be read it back to memory if there is processing required. In that case data can be too large to fit in memory as well.

Finally, even if the disk space is super large, spark.local.dir cannot be the one stop solution as - if the data is larger, Spark won't be able to spill additional data to disk which can lead to OutOfMemoryError.

  1. A Spark executor needs to transfer data to another executor ie. a shuffle, which might cause a OOM. Naive Solution - As the recipient executor is receiving the data it starts writing to disk if it starts to not fit in RAM.

You are right in stating that the shuffle operation when it becomes an overload it will spill to disk to try to complete the job. But, Spark does not immediately write to disk if the memory is not capable of handling the data. It first fills the data in memory and then there is some serialization taking places prior to spilling the data to disk from RAM. If the spilled data is needed again, Spark has to read it back to memory which can again throw an OOM error.

  1. A coalesce(1) is triggered which triggers a shuffle and the data does not fit in memory. Naive Solution - The executor gathers all data and as soon as it sees data becoming too large for its own RAM it starts writing to disk.
  1. A driver node does a collect() on a very large dataset and cannot fit in the data into memory. Naive Solution - The driver keeps receiving data and keeps it in RAM for as long as it can, after that, yeah you got it, spill to disk.

Same argument as earlier stands, the coalesce(1) is memory intensive and will try to reduce the number of partitions in the RDD/dataframe and when the shuffle is triggered the data needs to fit in memory before being able to spill to disk. Same stands for an expensive operation on a large data on which operations like collect() or take(m) are invoked.

Once the spill starts, the data is going to be serialized prior to the spill. If the memory is already full, then the serialization cannot be carried out effectively. Hence an OOM error occurs.

Be it the executor memory, or the driver memory. Spark is architected in a way so that processing is bound to happen first in memory as it is much faster to access the data than its counterpart (disk). Though data storage can be controlled, we cannot as well dictate Spark to perform an operation to happen only on disk / only on memory. The actual execution of operations is governed by a complex interplay of different factors.

Hence, if the memory configuration is not set in the right way, the OOM errors are there to occur.

Synopsis on Spark Memory Management

This is based on the Spark's Unified Memory Manager (UMM) since Spark 1.6+.

1. Executor Memory

Here, the picture illustrates how Spark memory utilizes the RAM to process the data. Do focus on the JVM Heap memory which is effectively split based on the reserved overhead (300 MB) and the remaining is split as 60/40 to the (execution + storage) & (user) memory respectively.

Here, the OOM occurs when Spark executor is not able to handle the following scenarios:

  • Large RDDs/Dataframes/Datasets persisted in storage memory (before Spark attempts to spill) it throws an OOM error.
  • High concurrency in tasks in execution memory needing more than available execution memory (before Spark attempts to spill) throws an OOM error.
  • Dynamic Memory borrowing - Spark at best tries to manage the large memory operations by borrowing from that 60% allocated of (<JVM heap - 300 MB>). But even, then if one region takes too much memory (before Spark attempts to spill) throws an OOM error.

Other such scenarios where OOM could potentially happen at the executor side are : Garbage collection, misconfigured values for storage settings, complex in-memory shuffle actions (joins, sorts, & aggregation).

2. Driver Memory

Not only can executor memory thrown an OOM error, even driver memory can throw this error in the following scenarios:

  • Large memory utilized by Broadcast variables, accumulator variables, serializing/deserializing large data in driver memory.
  • Collecting large data using collect(), take(m), show(false). Spark fails with ..data too large to fit into driver's memory..
  • Misconfigured values for the storage settings of driver.

References
  1. https://db-blog.web.cern.ch/blog/luca-canali/2020-08-spark3-memory-monitoring