Parse latex macro trough sed

33 views Asked by At

General overview

The goal is to match the content of all \foo occurrences to transform it into <p>content of \foo</p>.

The details

The goal is to match some LaTeX macro’s content from it opening bracket to the closing bracket.

But, then, two problems could occur. With greedy, if their is a closing bracket later after the macro ending in the same line like in lorem ipsum \foo{dolor} sit amet et consectetur \bar{}, then s/\\foo{.*}/\1/ will match dolor} sit amet et consectetur \bar{}.

But, with non-greedy, I could match the closing bracket of a second macro inside \\foo. As example with lorem ipsum \foo{dolor \bar{sit amet} et consecteur} quia adipt with s/\\foo{.\{-}}/\1/ will match dolor \bar{sit amet.

In both little and greedy cases I fail to match the macro content and only the macro content.

The Question

So, how to match the macro content from the opening bracket to the corresponding closing bracket?

Alternative question: am I wrong to use sed, and then should I use a more dedicated LaTeX parsing tool?

2

There are 2 answers

0
helper On

perl provides this functionality with a package Text::Balanced

I used perl to fix Latex output of the following form:

From:

\noindent {\tt// substitute into diffEQ }
\begin{dmath} \label{eq:3}
b \frac{d^{}\left(\text{a*(1-exp(-c*t))}\right)}{\mathrm{dt^{}}}+k a \left(1-\mathrm{e}^{-c t}\right)=F
\end{dmath}

To:

\noindent {\tt// substitute into diffEQ }
\begin{dmath} \label{eq:3}
b \frac{d^{}\left({a (1-\mathrm{e}^{-c t})}\right)}{\mathrm{dt^{}}}+k a \left(1-\mathrm{e}^{-c t}\right)=F
\end{dmath}

which yields:

Rendered Latex

with the following code:

perl -MText::Balanced -MData::Dumper -nlE '
    @brac = Text::Balanced::extract_bracketed($_, "{}", "^.*\\\\left\\(\\\\text");
    while ( defined(@brac[0]) ) {
#if(defined(@brac[0])) { print Data::Dumper::Dumper(\@brac)."\n" };
#print @brac[2], @brac[0], @brac[1];
    if(defined(@brac[0])) { modify_exp() };
    $_ = @brac[2] . @brac[0] . @brac[1];
    @brac = Text::Balanced::extract_bracketed($_, "{}", "^.*\\\\left\\(\\\\text");
    };
    print $_;

sub modify_exp {
    $brac[0] =~ s/\*/ /g;
    my @sub_brac = Text::Balanced::extract_bracketed($brac[0], "()", "^.*exp");
#print Data::Dumper::Dumper(\@sub_brac)."\n" ;
    $sub_brac[0] =~ s/\((.*)\)$/\\mathrm{e}^{$1}/g;
    $sub_brac[2] =~ s/exp//;
    $brac[0] = @sub_brac[2] . @sub_brac[0] . @sub_brac[1];
#print $brac[0];
#   $brac[0] =~ s/^{//;
#   $brac[0] =~ s/}$//;
    $brac[2] =~ s/\\text$//;
}
' "$1" 

The commented lines were used to debug the code. The following link describes the package:

https://metacpan.org/pod/Text::Balanced

0
helper On

sed can be used sometimes by using a trick where you use a regular expression that starts with the first bracket then includes as many characters that are NOT the closing bracket, "{[^}]*" for example. Finding the matching bracket can be a problem though. If, in the case of this question:

Remove all occurrence of a command, preserving command argument

If the matching bracket is followed by a space, or maybe some other character, sed can work.