How to share information across notebooks in a DSX project

148 views Asked by At

Is it possible to share information (such as credentials) across multiple notebooks in a DSX project, e.g. with environment variables?

For example a Cloud Foundry application in Bluemix has a control setting where environment variables can be defined, is there a similar concept for a DSX project (I couldn't see anything in the various project level settings).

2

There are 2 answers

0
Sumit Goyal On

Separate notebooks have separate runtimes in the background and at the moment it is not possible to share credentials among notebooks by defining environment variables. But there are helper methods for most obvious credential requirements in a project. This is called the "Insert to code" method.

For example: if you have an object store associated with your project.

  1. Select the "Data" tab in the top bar.
  2. Add some file to the object store by browsing or simple drag-n-drop.
  3. Insert credentials of that object store container in your notebook by selecting the "Insert credentials" option, right besides your file in the right hand side panel.
  4. You can then directly insert those credential (Step 3) in any other notebook in that project.

Besides "Insert to code" there are other helper functions like "Insert SparkR dataframe", "Pandas dataframe" etc. to speed up the analytics process of data scientists. Hope that was a bit helpful.

3
Chris Snow On

FYI - I've added a feature request on uservoice to allow Bluemix services to be bound to a project and then the credentials be accessed in the same way a Bluemix application accessess credentials. Please vote if you think this would be useful.


Currently, one pattern I use quite a lot is to create a notebook in my project that is used to save credentials to a file on DSX:

! echo '{ "username": "xxxx", "password": "xxxx", ... }'  > cloudant_creds.json

That file is now available to all of your notebooks on the project. NOTE: the file is saved on the spark service file system. If you use the same spark service in other dsx projects, they will also be able to access the file.

The credentials for cloudant normally include other fields such as host, I haven't shown these fields here so I can Keep the example simple. I have indicated there are more fields with the .... I normally copy this json from the bluemix service credentials field.

In your other notebooks, you would read the credentials something like this:

with open('cloudant_creds.json') as data_file:    
    sourceDB = json.load(data_file)

You can then refer the credentials like this:

    dfReader = sqlContext.read.format("com.cloudant.spark")
    dfReader.option("cloudant.host", sourceDB.host)

    if sourceDB.username:
        dfReader.option("cloudant.username", sourceDB.username)

    if sourceDB.password:
        dfReader.option("cloudant.password", sourceDB.password)

    df = dfReader.load(sourceDB.database).cache()