Protected characters in a shell function

75 views Asked by At

Many years ago, I wrote a shell function to search text string within PDF files. Like, I was in a directory with hundreds or thousands of PDF files, and call this function to search a particular phrase. This is the function

function pdfsearch() {
    local searchStr=${1:?"The string to search must be the argument"}
    find . -iname "*.pdf" | while read fname
    do
        pdftotext -q -enc ASCII7 "$fname" ".$fname~"; grep -s -H --color=always -i $searchStr ".fname~"
        rm ".fname~"
    done
}

Although ugly, this works fine. The encoding bit of the pdftotext was to remove the character ligatures of certain documents (the double f in stuff was being interpreted like the single character "ff").

Now, I'm trying to modify, so it search only in certain pdf, based on the filename. So it takes an extra argument. My attempt was this

function pdfsearch() {
    local searchStr=${1:?"The string to search must be the argument"}
    local fileStr=${2:-"*.pdf"}
    find . -iname $fileStr | while read fname
    do
        pdftotext -q -enc ASCII7 "$fname" ".$fname~"; grep -s -H --color=always -i $searchStr ".fname~"
        rm ".fname~"
    done
}

However, this does not work. I always get

pdfsearch:3 *.pdf not found

With the *.pdf substituted with whatever I pass as 2nd argument.

I believe the problem is the quotes I'm putting within fileStr. I tried with single quotes, protecting them with \ and putting "$fileStr".

I'm almost sure this is pretty basic character expansion syntax. Also, in case it matters, I'm using zsh as my shell. I used to put this function on my .bashrc and it's now on my .zshrc

Any suggestion?

Thanks

2

There are 2 answers

9
Kurtis Rader On

POSIX shell behavior regarding variable expansion is awful. I strongly recommend switching to a modern, sane, shell like Fish or Elvish. Having said that, your problem is the unquoted $fileStr in find . -iname $fileStr | while read fname. The shell is replacing $fileStr with its value: *.pdf, then attempting to expand the glob before running the find command. Since you don't have any files with a pdf extension in the CWD the glob expansion fails with the error you're seeing. Had the glob expanded to two or more files you would get a different error:

bash> touch x.pdf y.pdf
bash> x='*.pdf'
bash> find . -iname $x
find: y.pdf: unknown primary or operator

The solution is to quote the variable expansion:

bash> find . -iname "$x"
./x.pdf
./y.pdf

In fact, you should almost always double-quote variable expansion in POSIX shells.

3
phollox On

Thanks for the comments. I used the https://www.shellcheck.net/ tool as recommended, and found a couple of things to fix.

Also, in my actual code there was a space that should not have been there. I didn't put in the question cause I thought it was non relevant, but it turns out, it is. The code I was trying to run was

function pdfsearch() {
    local searchStr=${1:?"The string to search must be the argument"}
    local fileStr  =${2:-"*.pdf"}
    find . -iname $fileStr | while read fname
    do
        pdftotext -q -enc ASCII7 "$fname" ".$fname~"; grep -s -H --color=always -i $searchStr ".fname~"
        rm ".fname~"
    done
}

Apparently, I tried to align the = for some reason.

The fixed version:

function pdfsearch() {
    local searchStr=${1:?"The string to search must be the argument"}
    local fileStr=${2:-"*.pdf"}
    find . -iname "$fileStr" | while read -r fname
    do
        pdftotext -q -enc ASCII7 "$fname" ".$fname~"; grep -s -H --color=always -i "$searchStr" ".fname~"
        rm ".fname~"
    done
}

And this one works as intended