How to write output of Apache Crunch to Amazon S3 bucket

88 views Asked by At

Is there a way through which we can write our Apache Crunch output to S3 bucket. There is a method in crunch pipeline write which takes Target as parameter. Is there a way to add S3 as Target to write method of crunch.

1

There are 1 answers

0
sabbysabs On BEST ANSWER

Couldn't you just use the write method on your PCollection and supply it to your S3 location?

PCollection<String> items = ...;
items.write(To.avroFile("s3://bucket/prefix");
pipeline.done();

This essentially is how we do it, however we are running within EMR. For migrating data from our on-prem cluster, we utilize the Hadoop dist-cp command.