Suppose there is xml file:
<span id="assignee-val">
<span class="user-hover" id="issue_summary_assignee_m" rel="m">
<span class="aui-avatar aui-avatar-small"><div class="aui-avatar-inner"><img src="/secure/useravatar?size=small&avatarId=10222" /></div></span>
This Value!
</span>
</span>
The question is how to get "This Value!"
out of this xml.
This is what I've got :(
> :m + Control.Applicative Data.ByteString.Lazy Text.HTML.DOM Text.XML.Cursor
> Prelude.map content . (element "span" >=> "id" `attributeIs` "assignee-val" >=> child >=> element "span" >=> "class" `attributeIs` "user-hover" >=> child) . fromDocument . parseLBS <$> Data.ByteString.Lazy.readFile "xmlfile"
[["\n "],[],["\n This Value!\n "]]
- Why there are 3 answers? What query will define the content inside
<span class="user-hover">
tag more precisely? - How to remove space indentations and newline symbols automatically?
UPD: in other words, the question is how to drop all nested tags (it doesn't matter how many there will be) and get first level content only, which is "This Value!"
(and spaces and newlines).
The data you have navigated to holds the children of the "user-hover" span tag.... Pulling out the unimportant stuff, your node looks like this
An XML parser sees this as
So, the "user-hover" element does in fact have 3 children.
You then apply "content" to each of these values. Since the span element doesn't have any internal content in it, it returns "", and you get:
According to the xml spec, an xml parser must preserve space. There might be tools in the XML cursor lib to strip this space for you (some xml processing libraries give you options to turn on automatic post-processing whitespace stripping), but I am unaware of it. Just strip the whitespace in another call after the query.
You can use the
Data.Text.strip
function to do the whitespace stripping for you.To get the value you want, you need more information in the query.... Will the data always be in the third position of the "user-hover" span element? Will it always be after a
<span class="aui-avatar aui-avatar-small" />
element? Will it be all the content in the user-hover element concatenated with spaces stripped? Once you answer this, the solution should be obvious.Updated answer-
With the extra info you supplied, I can add more info to the answer.
The short answer is- remove the "Prelude.map content", and add a ">=> content" in the pipeline, and then add one more
Data.Text.concat
to the final output.Here are the details of why....
Almost all the functions in Text.XML.Cursor are of the form
a->[a]
, where the idea is to apply each filter to a list of nodes, then concat the results. This very closely resembles what happens in XPath, and was clearly modeled after that.The nice thing is, the pattern I just described is exactly how the array monad works.... If you chain together a bunch of
a->[a]
functions using bind(>>=)
, the pipeline will basically do aconcat . map f
to each stage in the pipeline. When you added themap content
to the front, it worked, but only did half of the intended job that the library intended it to do in a full XPath like tool. It pulled out the text content, but never concatenated the result. When used this way,content
returns a list of only the text in text nodes inside an element. You still need the one last concat to join those text items together.When I used the pipeline:
I got the result
You can still strip the final result with Data.Text.strip if you want to....