Perl matching text using recursions spanning multiple lines

51 views Asked by At

In the text below, I am trying to match \rem{lp}{some text with nested curly braces}. The regex101.com example works for the regex:

([^\s]*\s*)\\rem\{(?:lp|db)\}(\{((?>[^{}]|\n|(?2))*)\})(\s?[^\s]*)

I would like it to work in perl as well:

/([^\s]*\s*)\\rem\{(?:lp|db)\}(\{((?>[^{}]|\n|(?2))*)\})(\s?[^\s]*)/g

However, the perl version fails (the file text.txt containes the text below). It seems to be due to the line breaks. How do I correct that (I also tried adding m modifier without help?

perl -ne 'while(/([^\s]*\s*)\\rem\{(?:lp|db)\}(\{((?>[^{}]|\n|(?2))*)\})(\s?[^\s]*)/g){print ++$a."$&\n";}' test.txt

test.txt

Some text etc word\rem{lp}{ more text {\ce{O2}} further text {\ce{H2O}} . some other text the $m = 48$ ({\ce{O2}}) ps}.

\rem{lp}{ more text {\ce{O2}} further text {\ce{H2O}} . some other text the $m = 48$ ({\ce{O2}}) ps 
where  more text {\ce{O2}} further text {\ce{H2O}} . some other text the $m = 48$ ({\ce{O2}}) ps from 
who  more text {\ce{O2}} further text {\ce{H2O}} . some other text the $m = 48$ ({\ce{O2}}) ps.}

\rem{lp}{ more text {\ce{O2}} further text {\ce{H2O}} . some other text the $m = 48$ ({\ce{O2}}) ps. 
 more text {\ce{O2}} further text {\ce{H2O}} . some other text the $m = 48$ ({\ce{O2}}) ps.}

\subsection{other}
Our bla bla bla
1

There are 1 answers

0
atapaka On

Just in case someone stumbles upon this. Perl does not slurp in one liners, so the while portion reads still line by line. To get around that, one has to run

perl -0777 -ne 'while(/([^\s]*\s*)\\rem\{(?:lp|db)\}(\{((?>[^{}]|\n|(?2))*)\})(\s?[^\s]*)/g){print ++$a."$&\n";}' test.txt