find and replace regex like that can deal with nested brackets

83 views Asked by At

I try to scan a text and replace every \command{<more_text>} with <more_test> but can also contain }. I assume that the curly brackets in <more_text> are balanced.

Hence, I guess I could probably write a script from scratch that scans every character and checks if it is \ if yes then if the next is c and o and so on until I am at {. Then I would add +1 to a counter if I find { and -1 if I find } and I stop if my counter is negative.

However, I guess there are already more sophisticated solutions for this problem. I was thinking about regex, but I could not find a way to make the pattern detect the nested curly brackets. There are several question (and solutions) about this on this website, but none of them gave me a clue how to adjust them for my problem, since it is not really a recursion. Chat GPT was also only hallucinating answers, at least with regex.

I don't necessarily need to use regex. I am fine with something that works, preferable with emacs or python3.

Test strings that should be detected

The following text

some text \command{here more text \othercommand{dummy {text} lala} further text {dummy text 2} and done}

should be converted into

some text here more text \othercommand{dummy {text} lala} further text {dummy text 2} and done

1

There are 1 answers

1
SVBazuev On

This code handles all strings as you expect.

import re


input_strings = (
    r"\command{<more_text>}",
    r"some text \command{<mor{e_text>}",
    r"{{{{<some text>}}}\command{<mo{re<_}te}xt>}",
    r"some text \command{here more text \othercommand{dummy {text} lala} "
    r"further text {dummy text 2} and done}"
)

output_strings = []

for st in input_strings:
    output_strings.append(re.sub(   # This method is re.sub
        r"(\\command\{)|(\}{1}$)",  # find substrings matching the pattern
        "",                         # replace them with an empty string
        st                          # in the original string.
        )
    )
else:
    print(*output_strings, sep="\n")

How it works:

In test strings, you need to remove the substring "\command{" and the final character "}".

  1. the substring "\command{" contains "\" & "{", in regular expressions it must be escaped, therefore the first group in the expression will look like this - (\\command\{)
  2. the group for the final character looks like this - (\}{1}$)
  3. combine these two groups with the conditional operator | (or)
    and get - (\\command\{)|(\}{1}$)

That's it, the template is ready to use!
Check it out on regex101.com