How to use stocator from IBM Jupyter notebook running pyspark?

Question

How to use stocator from IBM Jupyter notebook running pyspark?

498 views Asked by fwong_pong At 14 November 2019 at 21:33

I want to use stocator to access IBM cloud storage from a Jupyter notebook (on IBM Watson Studio) running pyspark. Can someone please tell me how to go about this?

I understand that stocator is pre-installed but do you have to put in credentials or settings from within the notebook first (if there's a specific bucket on COS I'm trying to access)

For eg. I have a bucket name: my-bucket

How do I access it?

I know I can use ibm_boto3 to directly access COS but this is for a spark application due to which I need to be able to do so through stocator.

Original Q&A

There are 2 answers

**fwong_pong** · Answer 1 · 2019-11-19T19:09:36+00:00

Okay so to get it to work in my case, I had to add the access key too plus you have to make sure that you're using the service name correctly as it applies to you but it should be the same in all instances you use it.

hconf = sc._jsc.hadoopConfiguration()
hconf.set("fs.cos.sname.iam.api.key", "API_KEY")
hconf.set("fs.cos.sname.access.key","ACCESS_KEY")
hconf.set("fs.cos.sname.endpoint", "ENDPOINT")
rdd = sc.textFile('file.txt')
rdd.saveAsTextFile('cos://bname.sname/test.txt')

**charles gomes** · Answer 2 · 2019-11-18T19:57:33+00:00

All you need to do is set the hadoop configuration parameters for spark and then you should be able to write the dataframe as csv inside your COS bucket. Make sure the credentials you use do have writer or higher IAM access to the COS bucket..

hconf = sc._jsc.hadoopConfiguration()
hconf.set("fs.cos.servicename.iam.api.key", "**********")
hconf.set("fs.cos.servicename.endpoint", "<BUCKET_ENDPOINT>")
df.write.format("csv").save("cos://<bucket>.myservice/filename.csv")

The above code was reference from this medium article:- https://medium.com/@rachit1arora/efficient-way-to-connect-to-object-storage-in-ibm-watson-studio-spark-environments-d6c1199f9f97

TechQA.

How to use stocator from IBM Jupyter notebook running pyspark?

There are 2 answers

Related Questions in JUPYTER-NOTEBOOK

Related Questions in IBM-CLOUD

Related Questions in IBM-WATSON

Related Questions in WATSON-STUDIO

Related Questions in STOCATOR

Popular Questions

Popular Tags

Trending Questions