According to the manual, the option -b can give the byte offset of a given occurence, but it seems to start from the beginning of the parsed content.

I need to retrieve the position of each matching content returned by grep. I used this line, but it's quite ugly:

grep '<REGEXP>' | while read -r line ; do echo $line | grep -bo '<REGEXP>' ; done

How to get it done in a more elegant way, with a more efficient use of GNU utils?


$ echo "abcdefg abcdefg" > test.txt
$ grep 'efg' | while read -r line ; do echo $line | grep -bo 'efg' ; done < test.txt

(Indeed, this command line doesn't output the line number, but it's not difficult to add it.)

2 Answers

choroba On Best Solutions

Perl is not a GNU util, but can solve your problem nicely:

perl -nle 'print "$.:$-[0]" while /efg/g'
Ed Morton On

With any awk (GNU or otherwise) in any shell on any UNIX box:

$ awk -v re='efg' -v OFS=':' '{
    end = 0
    while( match(substr($0,end+1),re) ) {
        print NR, end+=RSTART, substr($0,end,RLENGTH)
}' test.txt

All strings, fields, array indices in awk start at 1, not zero, hence the output not looking like yours since to awk your input string is:

abcdefg abcdefg

rather than:

abcdefg abcdefg

Feel free to change the code above to end+=RSTART-1 and end+=RLENGTH if you prefer 0-indexed strings.