How to configure Databricks token inside Docker File

1.8k views Asked by At

I have a docker file where I want to

  1. Download the Databricks CLI
  2. Configure the CLI by adding a host and token
  3. And then running a python file that hits the Databricks token

I am able to install the CLI in the docker image, and I have a working python file that is able to submit the job to the Databricks API but Im unsure of how to configure my CLI within docker.

Here is what I have

FROM python
MAINTAINER nope

# Creating Application Source Code Directory
RUN mkdir -p /src

# Setting Home Directory for containers
WORKDIR /src

# Installing python dependencies
RUN pip install databricks_cli

# Not sure how to do this part???
# databricks token kicks off the config via CLI
RUN databricks configure --token

# Copying src code to Container
COPY . /src

# Start Container
CMD echo $(databricks --version)

#Kicks off Pythern Job
CMD ["python", "get_run.py"]

If I was to do databricks configure --token in the CLI it would prompt for the configs like this :

databricks configure --token
Databricks Host (should begin with https://): 
5

There are 5 answers

0
Alex Ott On

It's better not to do it this way for multiple reasons:

  1. It's insecure - if you configure Databricks CLI this way it will generate a file inside the container that could be read by anyone who has access to it
  2. Token has time-to-live (default is 90 days) - this means that you'll need to rebuild your containers regularly...

Instead it's just better to pass two environment variables to the container, and they will be picked up by the databricks command. These are DATABRICKS_HOST and DATABRICKS_TOKEN as it described in the documentation.

0
Pedram Ataee On

It is not very secure to put your token in the DockerFile. However, if you want to pursue this approach you can use the code below.

RUN export DATABRICKS_HOST=XXXXX && \
    export DATABRICKS_API_TOKEN=XXXXX && \
    export DATABRICKS_ORG_ID=XXXXX && \
    export DATABRICKS_PORT=XXXXX && \
    export DATABRICKS_CLUSTER_ID=XXXXX && \
    echo "{\"host\": \"${DATABRICKS_HOST}\",\"token\": \"${DATABRICKS_API_TOKEN}\",\"cluster_id\":\"${DATABRICKS_CLUSTER_ID}\",\"org_id\": \"${DATABRICKS_ORG_ID}\", \"port\": \"${DATABRICKS_PORT}\" }" >> /root/.databricks-connect

Make sure to run all the commands in a line using one RUN command. Otherwise, the variable such as DATABRICKS_HOST or DATABRICKS_API_TOKEN may not properly propagate.

If you want to connect to a Databricks Cluster within a docker container you need more configuration. You can find the required details in this article: How to Connect a Local or Remote Machine to a Databricks Cluster

0
chá de boldo On

The number of personal access tokens per user is limited to 600 But via bash is easy echo "y $(WORKSPACE-REGION-URL) $(CSE-DEVELOP-PAT) $(EXISTING-CLUSTER-ID) $(WORKSPACE-ORG-ID) 15001" | databricks-connect configure

0
SachinG On

If you want to access databricks models/download_artifacts using hostname and access token like how you do on databricks cli

databricks configure --token --profile profile_name

Databricks Host (should begin with https://): your_hostname

Token : token

if you have created profile name and pushed models and just want to access the model/artifacts in docker using this profile

Add below code in the docker file.

RUN pip install databricks_cli 

ARG HOST_URL

ARG TOKEN

RUN echo "[<profile name>]\nhost = ${HOST_URL}\ntoken = ${TOKEN}" >> ~/.databrickscfg

#this will created your .databrickscfg file with host and token after build the same way you do using databricks configure command

Add args HOST_URL and TOKEN in the docker build

e.g

your host name = https://adb-5443106279769864.19.azuredatabricks.net/

your access token = dapi********************53b1-2

sudo docker build -t test_tag --build-arg HOST_URL=<your host name> --build-arg TOKEN=<your access token> .

And now you can access your experiments using this profilename Databricks:profile_name in the code.

0
jonchar On

When databricks configure is run successfully, it writes the information to the file ~/.databrickscfg:

[DEFAULT]
host = https://your-databricks-host-url
token = your-api-token

One way you could set this in the container is by using a startup command (syntax here for docker-compose.yml):

/bin/bash -ic "echo '[DEFAULT]\nhost = ${HOST_URL}\ntoken = ${TOKEN}' > ~/.databrickscfg"