According to the manual, the option -b can give the byte offset of a given occurence, but it seems to start from the beginning of the parsed content.

I need to retrieve the position of each matching content returned by grep. I used this line, but it's quite ugly:

grep '<REGEXP>' | while read -r line ; do echo $line | grep -bo '<REGEXP>' ; done

How to get it done in a more elegant way, with a more efficient use of GNU utils?

Example:

$ echo "abcdefg abcdefg" > test.txt
$ grep 'efg' | while read -r line ; do echo $line | grep -bo 'efg' ; done < test.txt
4:efg
12:efg

(Indeed, this command line doesn't output the line number, but it's not difficult to add it.)

2 Answers

1
choroba On Best Solutions

Perl is not a GNU util, but can solve your problem nicely:

perl -nle 'print "$.:$-[0]" while /efg/g'
2
Ed Morton On

With any awk (GNU or otherwise) in any shell on any UNIX box:

$ awk -v re='efg' -v OFS=':' '{
    end = 0
    while( match(substr($0,end+1),re) ) {
        print NR, end+=RSTART, substr($0,end,RLENGTH)
        end+=RLENGTH-1
    }
}' test.txt
1:5:efg
1:13:efg

All strings, fields, array indices in awk start at 1, not zero, hence the output not looking like yours since to awk your input string is:

123456789012345
abcdefg abcdefg

rather than:

012345678901234
abcdefg abcdefg

Feel free to change the code above to end+=RSTART-1 and end+=RLENGTH if you prefer 0-indexed strings.