API data to hadoop via Flume

538 views Asked by At

I have an API which returns data in xml format.

I would like to run this on daily basis and store the returned data in Hadoop. Bit lost after going through documents of flume set up. Anyone has end to end steps for use case of pulling data from simple external API like above via flume/scheduling it using oozie?

Currently, I have created a Java program which can pull the data and place it in a file with indeed_ddmmyyyyhhmmss.xml and subsequently similar named tab delimited txt format for ease of use. I can cron it on daily basis and create external table in hive to point the location of file. Doesn't look like elegant solution for me.

1

There are 1 answers

0
Dmitry Zaytsev On

You might use the embedded agent feature inside your java program and send the events directly to the flume instance