regex: negative lookahead with multiline text

65 views Asked by At

I have several files with text blocks like that:

  maxlength:
    maxlength_js: 500
    maxlength_js_label: 'Inhalt auf @limit Zeichen begrenzt, verbleibend: <strong>@remaining</strong>'
    maxlength_js_enforce: true
    maxlength_js_truncate_html: false

Sometimes maxlength_js_enforce: true is present sometimes not. I want to add maxlength_js_enforce: true to all such maxlenght: blocks by find and replace.

I tried to find all maxlength: blocks without maxlength_js_enforce with a regex pattern with negative lookahead but somehow I can't get it working with multiple lines and no real limit for the end of such a block.

I have tried it with a pattern like that (maxlength:(\s*.*))(?!enforce) Here is an example with a whole file with multiple occurences of such a block https://regexr.com/7mjh1 Make sure to open the whole file example to understand the problem.

Either it matches to many lines or not enough and even while "enforce" is present it is a match. I think I don't get the concept correct and have problems with specifing the borders of such a block in the file, guess it must be really simple but don't get it.

2

There are 2 answers

0
Wiktor Stribiżew On BEST ANSWER

You can use

(?m)^(\h*)maxlength:(?:\R\1(?!\h*maxlength_js_enforce:)(\h+).*)*$(?!\R\1\h)

Then, you need to replace with

$0\n$1$2maxlength_js_enforce: true

See the regex demo.

Details:

  • (?m) - a Pattern.MULTILINE inline option
  • ^ - start of a line
  • (\h*) - Group 1: zero or more horizontal whitespaces
  • maxlength: - a text
  • (?:\R\1(?!\h*maxlength_js_enforce:)(\h+).*)* - zero or more sequences of
    • \R - any line break sequence
    • \1 - Same value as in Group 1 (the indentation whitespace sequence)
    • (?!\h*maxlength_js_enforce:) - not followed with zero or more horizontal whitespace and then maxlength_js_enforce: text
    • (\h+) - Group 1: one or more horizontal whitespaces
    • .* - the rest of the line
  • $ - end of the line...
  • (?!\R\1\h) - that is not immediately followed with a line break, the same value as captured in Group 1 and then a single horizontal whitespace.

The $0\n$1$2maxlength_js_enforce: true replacement pattern replaces the matched text with itself ($0) and then adds a newline (\n), Group 1 value ($1), Group 2 value ($2) and then the maxlength_js_enforce: true text.

0
Tenatus On

I added the multiline flag to the pattern in order to use ^ for beginning of line. This will match in each group up to and including the optional existing key:

^((\x20*)maxlength:.*([\r\n]+\2\x20{2}(?!\x20*maxlength_js_enforce:).*)+)([\r\n]+\2\x20*maxlength_js_enforce:.*)?

You can replace with this to add/replace the setting you want:

$1\n$2  maxlength_js_enforce: true

This will also work to replace maxlength_js_enforce: false (although there isn't an example of this in the demo)