I am trying to use cudf on databricks.
I started following https://medium.com/rapids-ai/rapids-can-now-be-accessed-on-databricks-unified-analytics-platform-666e42284bd1. But the init script link is broken.
Then, I followed this link (https://github.com/rapidsai/spark-examples/blob/master/getting-started-guides/csp/databricks/databricks.md#start-a-databricks-cluster) which install the cudf jars on the cluster. Still I could not import cudf
.
I also tried:
%sh conda install -c rapidsai -c nvidia -c numba -c conda-forge cudf=0.13 python=3.7 cudatoolkit=10.1
which also failed with a long error ending with:
active environment : /databricks/python
active env location : /databricks/python
shell level : 2
user config file : /root/.condarc
populated config files : /databricks/conda/.condarc
conda version : 4.8.2
conda-build version : not installed
python version : 3.7.6.final.0
virtual packages : __cuda=10.2
__glibc=2.27
base environment : /databricks/conda (writable)
channel URLs : https://conda.anaconda.org/nvidia/linux-64
https://conda.anaconda.org/nvidia/noarch
https://conda.anaconda.org/rapidsai/linux-64
https://conda.anaconda.org/rapidsai/noarch
https://conda.anaconda.org/numba/linux-64
https://conda.anaconda.org/numba/noarch
https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
https://conda.anaconda.org/pytorch/linux-64
https://conda.anaconda.org/pytorch/noarch
https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /databricks/python/pkgs
/local_disk0/conda/pkgs
envs directories : /databricks/conda/envs
/root/.conda/envs
platform : linux-64
user-agent : conda/4.8.2 requests/2.22.0 CPython/3.7.6 Linux/4.4.0-1114-aws ubuntu/18.04.5 glibc/2.27
UID:GID : 0:0
netrc file : None
offline mode : False
An unexpected error has occurred. Conda has prepared the above report.
Upload successful.
Any idea how to use cudf
on a databricks cluster ?
I think the OP want to use python with cudf. If so, that is not covered in the documentation.
But I tried to add below into the generate-init-script.ipynb to make it work:
Note: Change the cudf version and cudatoolkit according to your env.