Make sed regex alternations follow left to right precedence?

343 views Asked by At

I'm trying to use a regex to format some binary from xxd -b, but to demonstrate this simply I'll show you what I expect to happen:

Regex to delete: /1x|1.*/

Text: 1x21y3333333313333 -> 2

Where all occurrences of 1x are deleted, then everything starting at the first 1 that shows up should be deleted. It should be immediately obvious what's going on, but if it's not, play with this. The key is that if 1x is matched, the rest of the pattern should be aborted.

Here is the output from echo "AA" | xxd -b (the bindump of AA\n):

0000000: 01000001 01000001 00001010                             AA.

My goal is to 1. delete the first 0 for every byte (ascii = 7 bits) and 2. delete the rest of the string so only the actual binary is kept. So I have piped it into sed 's/ 0//g':

0000000:100000110000010001010                             AA.

Adding the second step, sed -E 's/ 0| .*//g':

0000000:

Obviously, I expect to instead get:

0000000:100000110000010001010

Things I've tried but haven't done the job:

  • xxd can take -g0 to merge the columns, but it retains the first zero in every byte (characters each take up a byte, not 7 bits)
  • -r

I will use perl instead in the meantime, but this behaviour baffles me and maybe there's a reason (lesson) here?

3

There are 3 answers

1
John1024 On BEST ANSWER

If I understand your question correctly, this produces what you want:

$ echo "AA" | xxd -b | sed -E 's/ 0|  .*//g'
00000000:100000110000010001010

The key change here is the use of two blanks in front of .* so that this only matches the part that you want to remove.

Alternatively, we can remove blank-zero first:

$ echo "AA" | xxd -b | sed -E 's/ 0//g; s/ .*//'
00000000:100000110000010001010
4
revo On

Try the following:

 s/ 0| [^0].*//g

The reason of the seen behavior is that POSIX rules the engines to follow the longest possible match standard. So as long as the second side of alternation is longer than first, even being second in order, it matches earlier.

0
AudioBubble On

tried on gnu sed

sed -E 's/\s+(0|[a-z.]+)//ig'