Using Apache Falcon to setup data replication accross clusters

Question

Using Apache Falcon to setup data replication accross clusters

176 views Asked by Shay At 23 February 2016 at 20:42

We have been PoC-ing falcon for our data ingestion workflow. We have a requirement to use falcon to setup a replication between two clusters (feed replication, not mirroring). The problem I have is that the user ID on cluster A is difference from the ID in cluster B. Has anyone used falcon with this setup? I can't seem to find a way to get this to work.

1) I am setting up a replication from Cluster A => Cluster B 2) I am defining the falcon job on cluster A

At the time of the job setup it looks like I can only define one user ID that owns the job. How do I setup a job where the ID on cluster A is different from the ID in cluster B? Any help would be awesome!!

Original Q&A

There are 1 answers

**Sanjeev** · Answer 1 · 2016-05-20T14:12:36+00:00

Apache Falcon uses 'ACL owner', which should have write access as the target cluster where the data is to be copied.

Source cluster should have webhdfs enabled, by which the data will be accessed.

So on the source cluster dont schedule the feed, if the user does not have write access which is required for retention.

Hope this helps.

TechQA.

Using Apache Falcon to setup data replication accross clusters

There are 1 answers

Related Questions in FALCON

Related Questions in BIGDATA

Popular Questions

Trending Questions