I'm trying to figure out a way to parse a html file with custom tags in the form:
[custom tag="id"]
Here's an example of a file I'm working with:
<p>This is an <em>amazing</em> example. </p>
<p>Such amazement, <span>many wow.</span> </p>
<p>Oh look, a wild [custom tag="amaze"] appears.</p>
We need maor embeds <a href="http://youtu.be/F5nLu232KRo"> bro
What I would like (in an ideal world) is to get back is a list of elements):
List foundElements = [text, custom tag, text, link, text]
Where the element in the above list contains:
Text:
<p>This is an <em>amazing</em> example. </p>
<p>Such amazement, <span>many wow.</span> </p>
<p>Oh look, a wild [custom tag="amaze"] appears.</p>
We need maor embeds
Custom tag:
[custom tag="amaze"]
Link:
<a href="http://youtu.be/F5nLu232KRo">
Text:
appears.</p>We need maor embeds
What I've tried:
- Jsoup
Jsoup is great, it works perfectly for HTML. The issue is I can't define custom tags with opening "[" and closing "]". Correct me if I'm wrong? - Jericho
Again like Jsoup, Jericho works great..except for defining custom tags. You're required to use "<". - Java Regex
This is the option I really don't want to go for. It's not reliable and there's a lot of string manipulation that is brittle, especially when you're matching against a lot of regexes.
Last but not least, I'm looking for a performance orientated solution as this is done on an Android client.
All suggestions welcome!