How can I get all the `\begin{definition}...\end{definition}` blocks in a LaTeX file?

3.7k views Asked by At

I just finished to write the summary for calculus in Latex.

The main problem now is that the files contains many things I don't really need now.

The .tex files contains many definitions and theorems that i need to study by heart.

The definitions have their own definition in the tex file, so any definition in the file will start with:

\begin{definition}

and ends with

\end{definition}

And the same for theorems.

I need to write something to take out whatever is inside the \begin{}...\end{}.

For example in a list called A:

\begin{document}

\begin{center}
\begin{definition} Hello WOrld! \end{definition}
\begin{example}A+B \end{example}
\begin{theorem} Tre Capre \end{theorem}
\begin{definition} Hello WOrld2! \end{definition}
\end{center}
\end{document}

should contains :[[\begin{definition} Hello WOrld! \end{definition}],[\begin{theorem} Tre Capre \end{theorem}],[\begin{definition} Hello WOrld2! \end{definition}]]

Looking in this site i found that i can use Regular Expressions:

for i in range(5):
    x = i+1
    raw = open('tex/chapter' + str(x) + '.tex')
    A = []
    for line in raw:
        A.append(re.match(r'(\begin{definition})://.*\.(\end{definition})$', line))
print(A)

but the output is just None and I don't really know why.

Edit:

import re


for i in range(5):
    x = i+1
    raw = open('tex/chapter' + str(x) + '.tex')
    A = re.findall(r'\\begin{definition}(.*?)\\end{definition}', raw.read())
    print(A)

the output is the following:

[]
[]
[]
[]
[]
3

There are 3 answers

5
sgp On BEST ANSWER

From what I get from the question you just want the definitions from the Latex file. You can use findall to directly get your definitions:

A = re.findall(r'{definition}(.*?)\\end{definition}', raw.read())

Note the usage to .*? in order to tackle the greedy regex matching

2
samcarter_is_at_topanswers.xyz On

You can let latex do the job, no need for external workarounds with python. Using the extract package, you can specify which environments you would like to extract and it will produce a second .tex file with the desired content.

0
user202729 On

While in this case the regular expression works well for most LaTeX files, for more complex tasks you should use a LaTeX parser library.

This one can be solved with pylatexenc like this:

from pylatexenc import latexwalker
from pylatexenc.latexwalker import LatexWalker, make_json_encoder
data="(LaTeX source code)"
def traverse(node: latexwalker.LatexNode)->None:
    if node.isNodeType(latexwalker.LatexEnvironmentNode):
        if node.environmentname=="definition":
            # Print the raw LaTeX of the whole node including the \begin{definition}
            print(node.latex_verbatim())
            # ... or without
            print("".join(x.latex_verbatim() for child in node.nodelist))
        for child in node.nodelist: traverse(child)
for node in LatexWalker(data).get_latex_nodes()[0]:
    traverse(node)