I'm working with a dockerized pyspark cluster which utilizes yarn. To improve the efficieny of the data processing pipelines I want to increase the amount of memory allocated to the pyspark executors and the driver.
This is done by adding the following two key, value pairs to the REST post request, which is sent out to Livy:
"driverMemory": "20g" "executorMemory": "56g"
Doing this results in the following error, which I've found in Livy's logs: java.lang.IllegalArgumentException: Required executor memory (57344), overhead (5734 MB), and PySpark memory (0 MB) is above the max threshold (8192 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
Of course I've appropriately edited the yarn-site.xml and set both of the mentioned values to 64 GB by including the following lines in the file and it looks like this but it doesn't seem to make a difference.
Similar problem occurs with different driverMemory and executorMemory values if executorMemory +10% overhead is more than 8192 MB.
How can I fix this and allocate more executor memory?
Make sure your
yarn.site
looks exactly the same on your master and worker containers in the moment of starting the service.It seems like you might edited it only on the master, which is a possible source of this confusion. As a general rule of thumb, all the config files (and many other things) must look exactly the same on all the machines in the cluster.