Which characters combined with ^ don't need to be escaped in sed?

277 views Asked by At

I have checked that ^* and ^& match lines beginning by * and &, which I didn't since they are special characters. But ^[ doesn't work. Is this "standard" behavior? Is there any rationale behind this?

sed version used was "GNU sed 4.4".

2

There are 2 answers

0
kvantour On BEST ANSWER

From POSIX.1-2017:

The sed utility shall support the BREs described in XBD Basic Regular Expressions, ... [sed]

Reading the POSIX section on BREs, we read:

A BRE special character has special properties in certain contexts. Outside those contexts, or when preceded by a <backslash>, such a character is a BRE that matches the special character itself. The BRE special characters and the contexts in which they have their special meaning are as follows:

  • .[\: The <period>, <left-square-bracket>, and <backslash> shall be special except when used in a bracket expression (see RE Bracket Expression). An expression containing a '[' that is unescaped and is not part of a bracket expression produces undefined results.
  • *: The <asterisk> shall be special except when used:
    • In a bracket expression
    • As the first character of an entire BRE (after an initial '^', if any)
    • As the first character of a subexpression (after an initial '^', if any); see BREs Matching Multiple Characters
  • ^: The <circumflex> shall be special when used as an anchor (see BRE Expression Anchoring). The <circumflex> shall signify a non-matching list expression when it occurs first in a list, immediately following a <left-square-bracket> (see RE Bracket Expression).
  • $: The <dollar-sign> shall be special when used as an anchor.

source: Basic Regular Expressions, Special characters

So to answer the OPs question using the above:

  • & is not a special character, so ^& is expected to work
  • [ should always be escaped if it is not used as a bracket expression.
  • * is not special after an initial ^ when the latter is an anchor.

So all observed statements by the OP are therefore valid.

There is however still an interesting paragraph in RE Bracket Expression:

A bracket expression is either a matching list expression or a non-matching list expression. It consists of one or more expressions: ordinary characters, collating elements, collating symbols, equivalence classes, character classes, or range expressions. The <right-square-bracket> ( ] ) shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial <circumflex>( ^ ), if any). Otherwise, it shall terminate the bracket expression, unless it appears in a collating symbol (such as [.].] ) or is the ending <right-square-bracket> for a collating symbol, equivalence class, or character class. The special characters ., *, [, and \\ ( <period>, <asterisk>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within a bracket expression.

source: Basic Regular Expressions, RE Bracket Expression

This implies that ] cannot be escaped in a bracket expression. This means:

The following work:

$ echo '[]' | sed 's/[^]x]/a/'
a]
$ echo '[]' | sed 's/[^x[.].]]/a/'
a]

but this does not work as expected:

$ echo '[]' | sed 's/[^x\]]/a/'
[]

So in a Bracket Expression, dont escape it, but collate it!

0
Wiktor Stribiżew On

See sed "3.3 Overview of Regular Expression Syntax" documentation.

The & char is not a special regex char, it does not need escaping in a regex pattern. Note that & can be parsed as a special construct in the replacement pattern where is refers to the whole match.

The * is not special when it is at the start in GNU sed (^* is a pattern that matches a * at the start of the string):

POSIX 1003.1-2001 says that * stands for itself when it appears at the start of a regular expression or subexpression, but many nonGNU implementations do not support this and portable scripts should instead use \* in these contexts.

The [ starts a bracket expression and must have a paired ] to close the expression, hence it is an error.