Recursively search a directory for each file in the directory on IBMi IFS

888 views Asked by At

I'm trying to write two (edit: shell) scripts and am having some difficulty. I'll explain the purpose and then provide the script and current output.

1: get a list of every file name in a directory recursively. Then search the contents of all files in that directory for each file name. Should return the path, filename, and line number of each occurrence of the particular file name.

2: get a list of every file name in a directory recursively. Then search the contents of all files in the directory for each file name. Should return the path and filename of each file which is NOT found in any of the files in the directories.

I ultimately want to use script 2 to find and delete (actually move them to another directory for archiving) unused files in a website. Then I would want to use script 1 to see each occurrence and filter through any duplicate filenames.

I know I can make script 2 move each file as it is running rather than as a second step, but I want to confirm the script functions correctly before I do any of that. I would modify it after I confirm it is functioning correctly.

I'm currently testing this on an IMBi system in strqsh.

My test folder structure is:

scriptTest
---subDir1
------file4.txt
------file5.txt
------file6.txt
---subDir2
------file1.txt
------file7.txt
------file8.txt
------file9.txt
---file1.txt
---file2.txt
---file3.txt

I have text in some of those files which contains existing file names.

This is my current script 1:

#!/bin/bash
files=`find /www/Test/htdocs/DLTest/scriptTest/ ! -type d -exec basename {} \;`
for i in $files
do
    grep -rin $i "/www/Test/htdocs/DLTest/scriptTest" >> testReport.txt;
done

Right now it functions correctly with exception to providing the path to the file which had a match. Doesn't grep return the file path by default?

I'm a little further away with script 2:

#!/bin/bash
files=`find /www/Test/htdocs/DLTest/scriptTest/ ! -type d`
for i in $files
do
    #split $i on '/' and store into an array
    IFS='/' read -a array <<< "$i"

    #get last element of the array 
    echo "${array[-1]}"

    #perform a grep similar to script 2 and store it into a variable
    filename="grep -rin $i "/www/Test/htdocs/DLTest/scriptTest" >> testReport.txt;"

    #Check if the variable has anything in it
    if [ $filename = "" ]   
            #if not then output $i for the full path of the current needle.
        then echo $i;
    fi
done

I don't know how to split the string $i into an array. I keep getting an error on line 6

001-0059 Syntax error on line 6: token redirection not expected.

I'm planning on trying this on an actual linux distro to see if I get different results.

I appreciate any insight in advanced.

1

There are 1 answers

0
Todd A. Jacobs On

Introduction

This isn't really a full solution, as I'm not 100% sure I understand what you're trying to do. However, the following contain pieces of a solution that you may be able to stitch together to do what you want.

Create Test Harness

cd /tmp
mkdir -p scriptTest/subDir{1,2}
mkdir -p scriptTest/subDir1/file{4,5,6}.txt
mkdir -p scriptTest/subDir2/file{1,8,8}.txt
touch scriptTest/file{1,2,3}.txt

Finding and Deleting Duplicates

In the most general sense, you could use find's -exec flag or a Bash loop to run grep or other comparison on your files. However, if all you're trying to do is remove duplicates, then you might simply be better of using the fdupes or duff utilities to identify (and optionally remove) files with duplicate contents.

For example, given that all the .txt files in the test corpus are zero-length duplicates, consider the following duff and fdupes examples

duff

Duff has more options, but won't delete files for you directly. You'll likely need to use a command like duff -e0 * | xargs -0 rm to delete duplicates. To find duplicates using the default comparisons:

$ duff -r scriptTest/
8 files in cluster 1 (0 bytes, digest da39a3ee5e6b4b0d3255bfef95601890afd80709)
scriptTest/file1.txt
scriptTest/file2.txt
scriptTest/file3.txt
scriptTest/subDir1/file4.txt
scriptTest/subDir1/file5.txt
scriptTest/subDir1/file6.txt
scriptTest/subDir2/file1.txt
scriptTest/subDir2/file8.txt

fdupes

This utility offers the ability to delete duplicates directly in various ways. One such way is to invoke fdupes . --delete --noprompt once you're confident that you're ready to proceed. However, to find the list of duplicates:

$ fdupes -R scriptTest/
scriptTest/subDir1/file4.txt            
scriptTest/subDir1/file5.txt
scriptTest/subDir1/file6.txt
scriptTest/subDir2/file1.txt
scriptTest/subDir2/file8.txt
scriptTest/file1.txt
scriptTest/file2.txt
scriptTest/file3.txt

Get a List of All Files, Including Non-Duplicates

$ find scriptTest -name \*.txt
scriptTest/file1.txt
scriptTest/file2.txt
scriptTest/file3.txt
scriptTest/subDir1/file4.txt
scriptTest/subDir1/file5.txt
scriptTest/subDir1/file6.txt
scriptTest/subDir2/file1.txt
scriptTest/subDir2/file8.txt

You could then act on each file with the find's -exec {} + feature, or simply use a grep that supports the --recursive --files-with-matches flags to find files with matching content.

Passing Find Results to a Bash Loop as an Array

Alternatively, if you know for sure that you won't have spaces in the file names, you can also use a Bash array to store the files into a variable you can iterate over in a Bash for-loop. For example:

files=$(find scriptTest -name \*.txt)
for file in "${files[@]}"; do
  : # do something with each "$file"
done

Looping like this is often slower, but may provide you with the additional flexibility you need if you're doing something complicated. YMMV.