Unexpected Error while Executing Simple grep Script

177 views Asked by At

I'm trying to collect a line from a series of very long files. Unfortunately, I need to extract the same line from an identically named file in 1600 distinct directories. The directory structure is like this.

Directory jan10 contains both the executed bash script, and directories named 18-109. The directories 18-109 each contain directories named 18A, 18B, ..., 18H. Inside each of these directories is the file "target.out" that we want the information from. Here is the code that I wrote to access this information:

for i in $(cat  ~/jan10/list.txt);
do
    cd $i
    cd *A

    grep E-SUM-OVERALL target.out | cut -c  17-24 > ../overallenergy.out

    cd ../*B
    grep E-SUM-OVERALL target.out | cut -c  17-24 >> ../overallenergy.out

    cd ../*C
    grep E-SUM-OVERALL target.out | cut -c  17-24 >> ../overallenergy.out

    cd ../*D
    grep E-SUM-OVERALL target.out | cut -c  17-24 >> ../overallenergy.out

    cd ../*E
    grep E-SUM-OVERALL target.out | cut -c  17-24 >> ../overallenergy.out

    cd ../*F
    grep E-SUM-OVERALL target.out | cut -c  17-24 >> ../overallenergy.out

    cd ../*G
    grep E-SUM-OVERALL target.out | cut -c  17-24 >> ../overallenergy.out

    cd ../*H

done

In this example, list.txt contains the numbers 18-109 each on a different line. An example of the "list.txt" is shown below:

17
18
19
20
21
22
23
24
25

Unexpectedly, this code simply won't work, it returns the error:

./testscript.sh: line 8: cd: 18: No such file or directory
./testscript.sh: line 11: cd: *A: No such file or directory

It returns this error for every numbered directory and every lettered sub-directory. Does anyone have any insight on what I've done wrong? I'll answer any questions, and I apologize again if this is unclear. The grep command by itself does work, so I imagine it's a problem with one of the "cd" commands, but I'm unsure. The code is being executed in the jan10 directory.

4

There are 4 answers

0
shellter On

Now that I understand your requirement better (my fault), here's a more fleshed out solution.

prompt$ cat simpleGrepScript.sh
#!/bin/bash
if ${testMode:-true} ; then
   echo "processing file $1 into outfile ${1%/*}/../overallenergy.out" 1>&2
else
   [[ -f "$1" ]] && grep 'E-SUM-OVERALL' "$1" > ${1%/*}/../overallenergy.out || echo "no file "$1" found" 1>&2
fi

Run

prompt$ find /starting/path -name target.out | xargs /path/to/simpleGrepScript.sh

if the output from the testMode

 "processing file $1 into outfile ${1%/*}/../overallenergy.out"

looks OK, then change to ${testMode:-false}.

If it doesn't look right, post the mininum error examples as a comment and I'll see if I can fix it.

If there are spaces in your path name, we'll have to circle back and add some more options to find and xargs.

IHTH.

0
Jdamian On
for Dir in $(cat  ~/jan10/list.txt)
do
     find "$Dir" -type f -name target.out |
     while read File
     do
          grep E-SUM-OVERALL "$File" > "${File%/*/target.out}"/overallenergy.out
     done
done
0
gboffi On

Define a shell function that, for a given directory, finds all the underlying targets and for each target outputs, on stdout, a suitable command.

% gen_greps () { 
    find $1 -name target.out | while read fname ; do 
        printf "grep E-SUM-OVERALL $fname | "
        printf "cut -c 17-24 > "
        printf "$(dirname $fname)/overallenergy.out\n"
    done
}
%

make a dry run

% gen_greps jan10
...
grep E-SUM-OVERALL jan10/29/29H/target.out | cut -c 17-24 > jan10/29/29H/overallenergy.out
...
% 

if what we see is what we want, pass the commands to a shell for execution

% gen_greps jan10 | sh
% 

That's all (?)

0
David W. On

Don't use for in this way. In order for for to execute, it must first process the cat command, and if there are white spaces in the file name, the for will fail. Plus, it's very possible to overload your command line when executing the for.

Instead use a while read loop which is more efficient and more tolerant of file name issues:

while read dir
do
    ....
done < ~/jan10/list.txt

It is also very dangerous to use glob patters in the cd command because more than one file could match that pattern, and that could cause cd to fail.

Also, if you find yourself piping to a series of grep, cut, sed commands, you can usually replace that with a single awk command.

If all of your files you need are called target.out and there are no other files called target.out that you want to skip, you can use find to find the various files without changing directories to each one:

Note how much shorter and simpler the entire program is:

while read dir
do
    find $dir -name "target.out" -type f \
        -exec awk '/E-SUM-OVERALL/ {print substr $0, 17, 8}' {}\;
done < ~/jan10/list.txt > overallenergy.out

I don't have any data, so it's sort of hard to actually test this. It maybe possible that I could simply use the field in my awk rather that substr. Or my substr command could be off.