I have a html page, with one structure that I want to turn into Clojure data structure. I’m hitting a mental block on how to approach this in an idiomatic way
This is the structure I have:
<div class=“group”>
<h2>title1<h2>
<div class=“subgroup”>
<p>unused</p>
<h3>subheading1</h3>
<a href=“path1” />
</div>
<div class=“subgroup”>
<p>unused</p>
<h3>subheading2</h3>
<a href=“path2” />
</div>
</div>
<div class=“group”>
<h2>title2<h2>
<div class=“subgroup”>
<p>unused</p>
<h3>subheading3</h3>
<a href=“path3” />
</div>
</div>
Structure I want:
'(
[“Title1” “subhead1” “path1”]
[“Title1” “subhead2” “path2”]
[“Title2” “subhead3” “path3”]
[“Title3” “subhead4” “path4”]
[“Title3” “subhead5” “path5”]
[“Title3” “subhead6” “path6”]
)
The repetition of titles is intentional.
I’ve read David Nolan’s enlive tutorial. That offers a good solution if there was a parity between group and subgroup, but in this case it can be random.
Thanks for any advice.
You can use Hickory for parsing, and then Clojure has some very nice tools for transforming the parsed HTML to the form you want:
Example: