I'm trying to write two (edit: shell) scripts and am having some difficulty. I'll explain the purpose and then provide the script and current output.
1: get a list of every file name in a directory recursively. Then search the contents of all files in that directory for each file name. Should return the path, filename, and line number of each occurrence of the particular file name.
2: get a list of every file name in a directory recursively. Then search the contents of all files in the directory for each file name. Should return the path and filename of each file which is NOT found in any of the files in the directories.
I ultimately want to use script 2 to find and delete (actually move them to another directory for archiving) unused files in a website. Then I would want to use script 1 to see each occurrence and filter through any duplicate filenames.
I know I can make script 2 move each file as it is running rather than as a second step, but I want to confirm the script functions correctly before I do any of that. I would modify it after I confirm it is functioning correctly.
I'm currently testing this on an IMBi system in strqsh.
My test folder structure is:
scriptTest
---subDir1
------file4.txt
------file5.txt
------file6.txt
---subDir2
------file1.txt
------file7.txt
------file8.txt
------file9.txt
---file1.txt
---file2.txt
---file3.txt
I have text in some of those files which contains existing file names.
This is my current script 1:
#!/bin/bash
files=`find /www/Test/htdocs/DLTest/scriptTest/ ! -type d -exec basename {} \;`
for i in $files
do
grep -rin $i "/www/Test/htdocs/DLTest/scriptTest" >> testReport.txt;
done
Right now it functions correctly with exception to providing the path to the file which had a match. Doesn't grep return the file path by default?
I'm a little further away with script 2:
#!/bin/bash
files=`find /www/Test/htdocs/DLTest/scriptTest/ ! -type d`
for i in $files
do
#split $i on '/' and store into an array
IFS='/' read -a array <<< "$i"
#get last element of the array
echo "${array[-1]}"
#perform a grep similar to script 2 and store it into a variable
filename="grep -rin $i "/www/Test/htdocs/DLTest/scriptTest" >> testReport.txt;"
#Check if the variable has anything in it
if [ $filename = "" ]
#if not then output $i for the full path of the current needle.
then echo $i;
fi
done
I don't know how to split the string $i
into an array. I keep getting an error on line 6
001-0059 Syntax error on line 6: token redirection not expected.
I'm planning on trying this on an actual linux distro to see if I get different results.
I appreciate any insight in advanced.
Introduction
This isn't really a full solution, as I'm not 100% sure I understand what you're trying to do. However, the following contain pieces of a solution that you may be able to stitch together to do what you want.
Create Test Harness
Finding and Deleting Duplicates
In the most general sense, you could use find's
-exec
flag or a Bash loop to run grep or other comparison on your files. However, if all you're trying to do is remove duplicates, then you might simply be better of using the fdupes or duff utilities to identify (and optionally remove) files with duplicate contents.For example, given that all the .txt files in the test corpus are zero-length duplicates, consider the following duff and fdupes examples
duff
Duff has more options, but won't delete files for you directly. You'll likely need to use a command like
duff -e0 * | xargs -0 rm
to delete duplicates. To find duplicates using the default comparisons:fdupes
This utility offers the ability to delete duplicates directly in various ways. One such way is to invoke
fdupes . --delete --noprompt
once you're confident that you're ready to proceed. However, to find the list of duplicates:Get a List of All Files, Including Non-Duplicates
You could then act on each file with the find's
-exec {} +
feature, or simply use a grep that supports the--recursive --files-with-matches
flags to find files with matching content.Passing Find Results to a Bash Loop as an Array
Alternatively, if you know for sure that you won't have spaces in the file names, you can also use a Bash array to store the files into a variable you can iterate over in a Bash for-loop. For example:
Looping like this is often slower, but may provide you with the additional flexibility you need if you're doing something complicated. YMMV.