I have a list of strings that follow a relatively similar format. Here are two examples:
text_1 = ''<abstract lang="en" source="my_source" format="org"><p id="A-0001" num="none">My text is here </p><img file="Uxx.md" /></abstract>''
text_2 = ''<abstract lang="db" source="abs" format="hrw" abstract-source="my_source"><p>Another text.</p></abstract>''
I can't vouch for other variations since it's an extensive collection of strings, but it's evident that the format is XML, and my sole objective is to retrieve the text from each of these strings. What do you sugest for this?
Use the
xmlpackage. It's part of stdlib and easy to use. Plus, it provides a nice tutorial.You can access the data:
edit: To view text of the
<p>element:You can also get information about the members of
rootandchild(both areElements) withhelp().