Piping ripgrep's output to Python for filtering (separate filename from match)

680 views Asked by At

I need to use ripgrep to find a certain pattern. This will be a string describing a chemical reaction. The output of ripgrep looks something like this:

~ rg -U --only-matching --vimgrep --replace='$1' '```smiles\n(.+)\n```'

Testing Smiles.md:5:1:OC(=O)CCC(=O)O>CCO.[H+]>CCOC(=O)CCC(=O)OCC
Another Smiles.md:5:1:CO>BrP(Br)Br>CBr

Cool! But now I need to filter out these results using a Python script. So I can pipe these results to Python and read from stdin. But there's a problem: how can I guarantee the delimiter? If I write the Python script to take everything after the 3rd colon to be the input string, how can I guarantee that the file itself doesn't have a colon in the name? How can properly separate the filename from the match when I pipe to python?

Thanks,

1

There are 1 answers

0
tshiono On

How about adding a pre-check stage before the ripgrep execution something like:

dir="."        # assign to your target directory
for f in "$dir"/*.md; do
    if [[ $f = *:* ]]; then             # if the file contains ":"
        badlist+=("$f")                 # then add the filename to the badlist
    fi
done
if (( ${#badlist[@]} > 0 )); then       # if the badlist is not empty...
    echo "These file(s) contain a colon character. Rename them and run again."
    printf "    %s\n" "${badlist[@]}"
    exit
fi

rg -U --only-matching --vimgrep --replace='$1' '```smiles\n(.+)\n```' "$dir"/*.md | python-script

The code above immediately stops the execution before the main ripgrep stage if any of the files contain : in the filenames. If found, you can rename the filename(s) then.