I am trying to configure Stocator on an Amazon EMR cluster to access data on Amazon s3. I have found resources that indicate that this should be possible, but very little detail on how to get this to work.
When I start my EMR cluster I use the following config:
{
"classification": "core-site",
"properties": {
"fs.stocator.scheme.list": "cos",
"fs.cos.impl": "com.ibm.stocator.fs.ObjectStoreFileSystem",
"fs.stocator.cos.impl": "com.ibm.stocator.fs.cos.COSAPIClient",
"fs.stocator.cos.scheme":"cos"
}
}
I then try to access a file using cos://mybucket.service/myfile
This yields an error due to missing credentials.
I add the credentials, in spark-shell, to the properties using:
val credentials = new com.amazonaws.auth.DefaultAWSCredentialsProviderChain().getCredentials
sc.hadoopConfiguration.set("fs.cos.service.access.key",credentials.getAWSAccessKeyId)
sc.hadoopConfiguration.set("fs.cos.service.secret.key",credentials.getAWSSecretKey)
Now when I try to access cos://mybucket.service/myfile I get the error: org.apache.spark.sql.AnalysisException: Path does not exist:.
accessing the file using s3://mybucket/myfile works, as it doesn't use Stocator. Also accessing the file via the amazon CLI works.
Are there any online resources detailing how to get Stocator working on AWS?
Has anyone successfully done this themselves, and can you share your configuration?
I'd play with the netflix one as I'm confident it works well there.