Does s3-dist-cp on EMR uses EMR consistent view metadata?

221 views Asked by At

I'm using EMR consistent view feature on EMR when running some of my Hive queries.

Now I need to access and copy objects directly from s3 using s3-dist-cp bypassing Hive interface which uses EMRFS consistent view metadata stored in DynamoDB.

When I looked up official docs for s3-distp-cp or other resources I haven't found definitive answer.

Per the thread in summer 2017 s3-dist-cp lacks support for EMR consistent view feature.

  1. Currently, s3-dist-cp on EMR releases do not completely use EMRFS and have code that directly uses the aws-java-sdk. The reasoning for this is that this would offer performance improvements over directly using EMRFS in certain cases. We have made efforts to increase usage of EMRFS in s3-dist-cp, but it is still not there yet. So, at this moment, I would recommend trying out DistCp.

https://forums.aws.amazon.com/thread.jspa?messageID=787883

Has anything changed in 2020 ?

0

There are 0 answers