I have requirement where I have to user multiple files from same directory with specific date as a input to mapreduce job.
not sure how I can do it.
hadoop jar EventLogsSW.jar EventSuspiciousWatch /user/hdfs/eventlog/*.snappy /user/hdfs/eventlog_output/op1
Example : from eventlog directory I need only present date file for processing.
eventlog directory has gets log data from a flume logger agent so it has 1000 of new files coming on daily basis. I that I need only present date file for my process.
Thanks.
Regards, Mohan.
you can use bash
date
command as$(date +%Y-%m-%d)
:for example, running as below will look for
/user/hdfs/eventlog/2017-01-04.snappy
log file and output will be stored to/user/hdfs/eventlog_output/2017-01-04
hdfs
dir:to get specific date format see this answer OR type
man date
command to learn more aboutdate
...update after more details provided:
1. explanation:
2. make shell script to reuse these commands daily... and in more logical way
This script can process more than one files in hdfs for present system date:
This script can process more than one files - one file at time - in hdfs for present system date: