Hadoop - Execute script when data arrives in hdfs

160 views Asked by At

Is there a tool in the Hadoop ecosystem which can actually know if new data has been added to the HDFS File System ?

Actually I want to execute remotely a sqoop import job from an external database (no merge, only new table). Then when this data is written in HDFS, it would execute a spark script that would process with the newly data added and do some stuffs.

Is there any feature in Hadoop that does this kind of job ?

I could totally execute the spark script after the sqoop import job is done, but I would like to know if such feature exists and haven't find any yet.

Thanks in advance.

1

There are 1 answers

0
Paul H. On BEST ANSWER

Yes. there is. There's a workflow tool called Oozie within Hadoop ecosystem to handle this kind of scenario.

Oozie provides workflow which can be triggered to run either based on a fixed schedule or data availability. In your case, it'll be considered as data availability. see more details at Oozie doc here : Oozie doc for coordinator job