How to supply user functions/modules in DSX

306 views Asked by At

I have some helper utilities defined in a separate python script. I would like to make the script available to the DSX notebook, so I can call them in the cell, but I don't want to put the script into the cell directly.

What are some of the ways to achieve this?

2

There are 2 answers

3
Chris Snow On BEST ANSWER

If you are ok with making your code publicly available on a public git repository, you could turn your code into a python package and save it in github. See here for an example package: A simple Hello World setuptools package and installing it with pip.

You can install it directly from github using:

!pip install --user git+https://github.com/public_account/public_repo

Private github repositories

It should also be possible to use a similar approach as above with a private github repository, with a few extra setup steps and a different url format for pip. E.g.

Generate a ssh key on dsx

! ssh-keygen -b 2048 -t rsa -f ~/.ssh/id_rsa -q -N ""

Add the output from the following command to your github account settings :: SSH and GPG keys

! cat ~/.ssh/id_rsa.pub

Next add the github ssh key to dsx:

! ssh-keyscan github.com >> ~/.ssh/known_hosts

IMPORTANT: You should manually verify that the imported github hosts key is authentic. You can view the imported key with:

! cat ~/.ssh/known_hosts

You can now install with pip:

! pip install --user git+ssh://[email protected]/private_account/private_repo

CAUTION! that there are some security considerations with the above approach. I.e. anyone with access to the spark service where you performed the above commands will be able to access the git private repository.


NOTE:

Ideally, In the future, I would like to see dsx provide support for editing all files in a project and committing all the project files to github, e.g.

0
Chris Snow On

One option is to upload your package to your spark account from a client machine using the following API call:

curl \
   -X PUT \
   -k \
   -u ${tenant_id}:${tenant_secret} \
   -H "X-Spark-service-instance-id: ${instance_id}" \
   --data-binary "@path_to_local_file" \
   ${cluster_master_url}/tenant/data/destination_file_name

The variables above can be obtained by logging in to the Bluemix console and navigating to Service Credentials. Alternative, you can use the command line cf tools to retrieve this information. This Q/A provides some more information on the cf command line approach.

After uploading your package to the spark service, you can use:

! pip install --user ${HOME}/data/destination_file_name

Credit to Roland Weber for this answer.