I wonder whether the GNU and BusyBox implementations of "sed" may be broken.
My default sed implementation is the one from GNU.
POSIX says:
An editing command with two addresses shall select the inclusive range from the first pattern space that matches the first address through the next pattern space that matches the second.
But then why gives
$ { echo ha; echo ha; echo ha; } | sed '0,/ha/ !d'
ha
instead of
ha
ha
? Clearly the 2nd "ha" here is the "next" pattern space which matches, so it should be output as well!
But even more strange,
$ { echo ha; echo ha; echo ha; } | busybox sed '0,/ha/ !d'
does not output anything at all!
But even if sed would do what the POSIX definition says, it is still unclear what should happen when a range expression is actually checked.
Does every range-condition has its own internal state? Or is there a single global state for all range-conditions in a sed script?
Obviously, a range condition needs at least to remember whether it is currently in the "search for a match of the first address"-state or in the "search for a match of the second address"-state. Perhaps it even needs to remember a third state "I have already processed the range and will not match again, no matter what".
It certainly matters when those conditions are updated: Every time a new pattern space is read? Every time the pattern space is modified, say by an s-command? Or just if the control flow reaches a range condition?
So, what is it?
Until I know better, I will avoid range conditions in my sed-scripts and consider them to be a dubious feature.
Two answers:
0is not a valid POSIX address (lines count from 1)0,/re/is a GNU extensionGNU awk man page includes:
Perhaps this will help clarify:
The busybox code explicitly checks addr1 is greater than 0 and so never enters matching state. See the busybox source code, line 1121:
POSIX says:
The test happens each time it is encountered:
This is also demonstrated by, for example, the busybox source code - see the
sed_cmd_stypedef.