I am using dask-yarn
in local mode in a mapr-cluster. I have unpacked the virtual environment in a shared folder between the nodes.
Some times the workers ( containers ) start properly in the cluster, but sometimes the containers have the next error message in yarn.
/usr/bin/env: 'python3.6': No such file or directory
In the meantime, I see a lot of containers with status FAILED ( > 1000 ). My initial provision is around 5 workers however I have to wait around 10 minutes or more until I get the initial provision.
The next is my /etc/dask/yarn.yaml
configuration
yarn:
specification: null
name: dask
queue: default
deploy-mode: local
environment: "venv://<shared_location>"
tags: []
user: ''
host: "host_name"
port: 8788
dashboard-address: ":17439"
scheduler:
vcores: 1
memory: 2GiB
worker:
vcores: 1
memory: 2GiB
restarts: -1
env: {'SOME_VAR':'some_value'}
Reason for the problem: Some of the nodes didnt have the same python version and in the same location. Since I am using a virtual environment. The virtual environment expected to have python in the same location in all Nodes