Dask Gateway - Dask Workers Dying Due to PermissionError

423 views Asked by At

I am trying to deploy Dask Gateway on Google Kubernetes Engine. No issues w/ the deployment. However, I am experiencing issues when using a custom dask-gateway dockerfile that inherits from the default docker image from dockerhub; the container is then submitted to Google Container Registry (GCR). It seems to result in the following PermissionError.

PermissionError: [Errno 13] Permission denied: '/home/dask/dask-worker-space

(See screenshot below for full stacktrace)

The intriguing part is that the dark workers start up without any issue when the dask workers use the docker image directly from dockerhub instead of GCR. I need to use a custom dockerfile to add a few more python packages to the dark workers, but other than that, there are no other configuration changes. It's as though sending the docker container to GCR does something funky to the permissions.

Here is the full stacktrace of the error:

enter image description here

Here is the dockerfile I am using for the dask workers:

FROM daskgateway/dask-gateway:0.9.0

RUN pip --no-cache-dir install --upgrade cloudpickle dask-ml scikit-learn \
nltk gensim spacy keras asyncio google-cloud-storage SQLAlchemy snowflake-sqlalchemy google-api-core gcsfs pyarrow mlflow \
tensorflow prefect hvac aiofile google-cloud-logging

Any help would be greatly appreciated because I have no idea how to debug.

1

There are 1 answers

0
Carlos S. On

As you are using a GKE cluster, make sure that the service account that you set for the cluster has the correct permissions on the Container registry.

You are creating an image, and submitting it to Container Registry, so you will need writer permissions there. The process is different if you are using the default service account or a custom one.

  1. If you are using the default service account, you will need, at least, the Storage reader and writer scopes for this action. (GKE clusters are created by default only with reader scope).
  • If you have a running cluster, you will need to change the scopes on every nodepool

      gcloud container node-pools create [new pool name] \
      --cluster [cluster name] \
      --machine-type [your desired machine type] \
      --num-nodes [the same amount of nodes you have] \
      --scopes [your new set of scopes]
    

    (All the possible options can be found on the command gcloud container node-pools create --help)

    After you have done it, you will need to drain the nodes kubectl drain [node], and delete the old nodepool

      gcloud container node-pools delete [POOL_NAME] \
      --cluster [CLUSTER_NAME]
    
  • If you don't have a cluster, you can edit the scopes on the console while creating it, or, if you will create it using gcloud, with the scopes that you want (full list)

  1. If you are using a custom service account, make sure it has the role "roles/storage.admin" granted. (source)