I am trying to configure Stocator on an Amazon EMR cluster to access data on Amazon s3. I have found resources that indicate that this should be possible, but very little detail on how to get this to work.
When I start my EMR cluster I use the following config:
{
"classification": "core-site",
"properties": {
"fs.stocator.scheme.list": "cos",
"fs.cos.impl": "com.ibm.stocator.fs.ObjectStoreFileSystem",
"fs.stocator.cos.impl": "com.ibm.stocator.fs.cos.COSAPIClient",
"fs.stocator.cos.scheme":"cos"
}
}
I then try to access a file using cos://mybucket.service/myfile
This yields an error due to missing credentials.
I add the credentials, in spark-shell
, to the properties using:
val credentials = new com.amazonaws.auth.DefaultAWSCredentialsProviderChain().getCredentials
sc.hadoopConfiguration.set("fs.cos.service.access.key",credentials.getAWSAccessKeyId)
sc.hadoopConfiguration.set("fs.cos.service.secret.key",credentials.getAWSSecretKey)
Now when I try to access cos://mybucket.service/myfile
I get the error: org.apache.spark.sql.AnalysisException: Path does not exist:
.
accessing the file using s3://mybucket/myfile
works, as it doesn't use Stocator. Also accessing the file via the amazon CLI works.
Are there any online resources detailing how to get Stocator
working on AWS
?
Has anyone successfully done this themselves, and can you share your configuration?
I'd play with the netflix one as I'm confident it works well there.