copy file from unc to hdfs using shellscript

469 views Asked by At

I have UNC path folders in this path " //aloha/log/folderlevel1/folderlevel2/"

Each of these level2 folders will have files like "empllog.txt","deptlog.txt","adminlog.txt" and few others files as well.

I want to copy the content of this particular folders if they were created in last 24 hours & only if these 3 files are present to HDFS cloudera cluster.But if one of these files are not present , then that particular folder should not be copied. Also I need to preserve the folderstructre.

i.e In HDFS it should be "/user/test/todaydate/folderlevel1/folderlevel2"

I have written below shell script to copy files to hdfs with date folder created. But not sure how to proceed further with UNC Paths & other criterias.

        day=$(date +%Y-%m-%d)

    srcdir="/home/test/sparkjops"
    stdir="/user/test/$day/"

    hadoop  dfs -mkdir $day /user/test

    for f in ${srcdir}/*

    do

if [ $f == "$srcdir/empllog.txt" ]

then

   hadoop dfs -put $f   $stdir

elif [  $f == "$srcdir/deptlog.txt" ]

then hadoop dfs -put $f  $stdir

elif [ $f == "$srcdir/adminlog.txt" ]

then hadoop dfs -put $f $stdir

fi


done

I have tried to change the UNC Path like below . It did not do anything. No error & did not copy the content as well.

srcdir="//aloha/log/*/*"

srcdir='//aloha/log/*/*'

srcdir="\\aloha\log\*\*"

Appreciate all help. Thanks.

EDIT 1 :

I ran it with code sh -x debug mode.and also with bash -x(just to check). But It returned that file not found error as below

test@ubuntu:~/sparkjops$ sh -x ./hdfscopy.sh
+ date +%Y-%m-%d
+ day=2016-12-24
+ srcdir= //aloha/logs/folderlevel1/folderlevel2
+ stdir=/user/test/2016-12-24/
+ hadoop dfs -mkdir 2016-12-24 /user/test
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

mkdir: `2016-12-24': File exists
mkdir: `/user/test': File exists
+ //aloha/logs/folderlevel1/folderlevel2/* = //aloha/logs/folderlevel1/folderlevel2/empllog.txt.txt
./hdfscopy.sh: 12: ./hdfscopy.sh: //aloha/logs/folderlevel1/folderlevel2/*: not found
+ //aloha/logs/folderlevel1/folderlevel2/* = //aloha/logs/folderlevel1/folderlevel2/deptlog.txt.txt
./hdfscopy.sh: 12: ./hdfscopy.sh: //aloha/logs/folderlevel1/folderlevel2/*: not found
+ //aloha/logs/folderlevel1/folderlevel2/* = //aloha/logs/folderlevel1/folderlevel2/adminlog.txt.txt
./hdfscopy.sh: 12: ./hdfscopy.sh: //aloha/logs/folderlevel1/folderlevel2/*: not found
test@ubuntu:~/sparkjops$

But not able to understand why it is not reading from that path. I have tried different escaping sequences as well(doubleslash for each slash, forwardslash as we do in window folderpath) . But none working. All are throwing same error message. I am not sure how to read this file in the script. Any help would be appreciated.

0

There are 0 answers