Hadoop - Execute script when data arrives in hdfs

Question

Hadoop - Execute script when data arrives in hdfs

151 views Asked by David At 25 June 2015 at 19:03

Is there a tool in the Hadoop ecosystem which can actually know if new data has been added to the HDFS File System ?

Actually I want to execute remotely a sqoop import job from an external database (no merge, only new table). Then when this data is written in HDFS, it would execute a spark script that would process with the newly data added and do some stuffs.

Is there any feature in Hadoop that does this kind of job ?

I could totally execute the spark script after the sqoop import job is done, but I would like to know if such feature exists and haven't find any yet.

Thanks in advance.

Original Q&A

There are 1 answers

**Paul H.** · Accepted Answer · 2015-06-25T19:40:48+00:00

Yes. there is. There's a workflow tool called Oozie within Hadoop ecosystem to handle this kind of scenario.

Oozie provides workflow which can be triggered to run either based on a fixed schedule or data availability. In your case, it'll be considered as data availability. see more details at Oozie doc here : Oozie doc for coordinator job

TechQA.

Hadoop - Execute script when data arrives in hdfs

There are 1 answers

Related Questions in HADOOP

Related Questions in APACHE-SPARK

Related Questions in EXECUTE

Popular Questions

Popular Tags

Trending Questions