No FileSystem for scheme: cos

Question

No FileSystem for scheme: cos

2.6k views Asked by Chris Snow At 02 September 2017 at 07:54

I'm trying to connect to IBM Cloud Object Storage from IBM Data Science Experience:

access_key = 'XXX'
secret_key = 'XXX'
bucket = 'mybucket'
host = 'lon.ibmselect.objstor.com' 
service = 'mycos'

sqlCxt = SQLContext(sc)
hconf = sc._jsc.hadoopConfiguration()
hconf.set('fs.cos.myCos.access.key', access_key)
hconf.set('fs.cos.myCos.endpoint', 'http://' + host)
hconf.set('fs.cose.myCos.secret.key', secret_key)
hconf.set('fs.cos.service.v2.signer.type', 'false')

obj = 'mydata.tsv.gz'

rdd = sc.textFile('cos://{0}.{1}/{2}'.format(bucket, service, obj))
print(rdd.count())

This returns:

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.io.IOException: No FileSystem for scheme: cos

I'm guessing I need to use the 'cos' scheme based on the stocator docs. However, the error suggests stocator isn't available or is an old version?

Any ideas?

Update 1:

I have also tried the following:

sqlCxt = SQLContext(sc)
hconf = sc._jsc.hadoopConfiguration()
hconf.set('fs.cos.impl', 'com.ibm.stocator.fs.ObjectStoreFileSystem')
hconf.set('fs.stocator.scheme.list', 'cos')
hconf.set('fs.stocator.cos.impl', 'com.ibm.stocator.fs.cos.COSAPIClient')
hconf.set('fs.stocator.cos.scheme', 'cos')
hconf.set('fs.cos.mycos.access.key', access_key)
hconf.set('fs.cos.mycos.endpoint', 'http://' + host)
hconf.set('fs.cos.mycos.secret.key', secret_key)
hconf.set('fs.cos.service.v2.signer.type', 'false')

service = 'mycos'
obj = 'mydata.tsv.gz'          
rdd = sc.textFile('cos://{0}.{1}/{2}'.format(bucket, service, obj))
print(rdd.count())

However, this time the response was:

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.io.IOException: No object store for: cos
    at com.ibm.stocator.fs.ObjectStoreVisitor.getStoreClient(ObjectStoreVisitor.java:121)
    ...
Caused by: java.lang.ClassNotFoundException: com.ibm.stocator.fs.cos.COSAPIClient

Original Q&A

There are 5 answers

**Mariusz** · Answer 1 · 2017-09-02T08:23:23+00:00

It looks like cos driver is not properly initialized. Try this configuration:

hconf.set('fs.cos.impl', 'com.ibm.stocator.fs.ObjectStoreFileSystem')

hconf.set('fs.stocator.scheme.list', 'cos')
hconf.set('fs.stocator.cos.impl', 'com.ibm.stocator.fs.cos.COSAPIClient')
hconf.set('fs.stocator.cos.scheme', 'cos')

hconf.set('fs.cos.mycos.access.key', access_key)
hconf.set('fs.cos.mycos.endpoint', 'http://' + host)
hconf.set('fs.cos.mycos.secret.key', secret_key)
hconf.set('fs.cos.service.v2.signer.type', 'false')

UPDATE 1:

You also need to ensure stocator classes are on the classpath. You can use packages system by exceuting pyspark in the following way:

./bin/pyspark --packages com.ibm.stocator:stocator:1.0.24

This works with swift2d and cos scheme.

UPDATE 2:

Just follow Stocator documentation (https://github.com/CODAIT/stocator). It contains all details how to install it, what branch to use, etc.

**Bassel Zeidan** · Answer 2 · 2017-09-04T11:18:13+00:00

The latest version of Stocator (v1.0.9) that supports fs.cos scheme is not yet deployed on Spark aaService (It will be soon). Please use the stocator scheme "fs.s3d" to connect to your COS.

Example:

endpoint = 'endpointXXX' 
access_key = 'XXX'
secret_key = 'XXX'

prefix = "fs.s3d.service"
hconf = sc._jsc.hadoopConfiguration()
hconf.set(prefix + ".endpoint", endpoint)
hconf.set(prefix + ".access.key", access_key)
hconf.set(prefix + ".secret.key", secret_key)

bucket = 'mybucket'
obj = 'mydata.tsv.gz'

rdd = sc.textFile('s3d://{0}.service/{1}'.format(bucket, obj))
rdd.count()

Alternatively, you can use ibmos2spark. The lib is already installed on our service. Example:

import ibmos2spark

credentials = {
   'endpoint': 'endpointXXXX',
   'access_key': 'XXXX',
   'secret_key': 'XXXX'
}

configuration_name = 'os_configs' # any string you want
cos = ibmos2spark.CloudObjectStorage(sc, credentials, configuration_name)

bucket = 'mybucket'
obj = 'mydata.tsv.gz'
rdd = sc.textFile(cos.url(obj, bucket))
rdd.count()

**Roland Weber** · Answer 3 · 2017-09-04T05:50:16+00:00

Stocator is on the classpath for Spark 2.0 and 2.1 kernels, but the cos scheme is not configured. You can access the config by executing the following in a Python notebook:

!cat $SPARK_CONF_DIR/core-site.xml

Look for the property fs.stocator.scheme.list. What I currently see is:

<property>
    <name>fs.stocator.scheme.list</name>
    <value>swift2d,swift,s3d</value>
</property>

I recommend that you raise a feature request against DSX to support the cos scheme.

**Bruno Ambrozio** · Answer 4 · 2020-09-07T17:36:43+00:00

You have to set .config("spark.hadoop.fs.stocator.scheme.list", "cos") along with some others fs.cos... configurations. Here's an end-to-end snippet code example that works (tested with pyspark==2.3.2 and Python 3.7.3):

from pyspark.sql import SparkSession

stocator_jar = '/path/to/stocator-1.1.2-SNAPSHOT-IBM-SDK.jar'
cos_instance_name = '<myCosIntanceName>'
bucket_name = '<bucketName>'
s3_region = '<region>'
cos_iam_api_key = '*******'
iam_servicce_id = 'crn:v1:bluemix:public:iam-identity::<****************>'

spark_builder = (
    SparkSession
        .builder
        .appName('test_app'))

spark_builder.config('spark.driver.extraClassPath', stocator_jar)
spark_builder.config('spark.executor.extraClassPath', stocator_jar)
spark_builder.config(f"fs.cos.{cos_instance_name}.iam.api.key", cos_iam_api_key)
spark_builder.config(f"fs.cos.{cos_instance_name}.endpoint", f"s3.{s3_region}.cloud-object-storage.appdomain.cloud")
spark_builder.config(f"fs.cos.{cos_instance_name}.iam.service.id", iam_servicce_id)
spark_builder.config("spark.hadoop.fs.stocator.scheme.list", "cos")
spark_builder.config("spark.hadoop.fs.cos.impl", "com.ibm.stocator.fs.ObjectStoreFileSystem")
spark_builder.config("fs.stocator.cos.impl", "com.ibm.stocator.fs.cos.COSAPIClient")
spark_builder.config("fs.stocator.cos.scheme", "cos")

spark_sess = spark_builder.getOrCreate()

dataset = spark_sess.range(1, 10)
dataset = dataset.withColumnRenamed('id', 'user_idx')

dataset.repartition(1).write.csv(
    f'cos://{bucket_name}.{cos_instance_name}/test.csv',
    mode='overwrite',
    header=True)

spark_sess.stop()
print('done!')

**Vincenzo Lavorini** · Answer 5 · 2020-06-29T15:14:05+00:00

I found the same issue, and to solve it I just changed environment:

Within IBM Watson Studio, if you start a a Jupyter notebook in an environment without a pre-configured spark cluster, than you get that error. Installing PySpark is not enough.

Instead, if you start a notebook with the Spark cluster available, you will be just fine.

TechQA.

No FileSystem for scheme: cos

There are 5 answers

Related Questions in PYSPARK

Related Questions in DATA-SCIENCE-EXPERIENCE

Related Questions in IBM-CLOUD-STORAGE

Related Questions in STOCATOR

Popular Questions

Popular Tags

Trending Questions