I am looking through an XML file and trying to find words that come after the word "has" and I am trying to work out how to count the frequency of each word. Currently I have found all words that come after the word "has" but this contains duplicates. How would I make it so I group the 'successor' words and do a count on each?
I am using xQuery 1.0
Snippet of the XML file:
-<s n="2">
<w pos="CONJ" hw="that" c5="CJT-DT0">That </w>
<w pos="PRON" hw="you" c5="PNP">you</w>
<w pos="VERB" hw="be" c5="VBB">'re </w>
<w pos="VERB" hw="greet" c5="VVN">greeted </w>
<w pos="PREP" hw="in" c5="PRP">in </w>
<w pos="ART" hw="the" c5="AT0">the </w>
<w pos="ADJ" hw="first" c5="ORD">first </w>
<w pos="SUBST" hw="place" c5="NN1-VVB">place </w>
<w pos="PREP" hw="with" c5="PRP">with </w>
<w pos="UNC" hw="erm" c5="UNC">erm </w>
<w pos="ADV" hw="either" c5="AV0">either </w>
<w pos="SUBST" hw="silence" c5="NN1-VVB">silence </w>
<w pos="CONJ" hw="or" c5="CJC">or </w>
<w pos="ADJ" hw="some" c5="DT0">some </w>
<w pos="ADJ" hw="vague" c5="AJ0">vague </w>
<w pos="CONJ" hw="and" c5="CJC">and </w>
<w pos="ADV" hw="not" c5="XX0">not </w>
<w pos="ADV" hw="singularly" c5="AV0">singularly </w>
<w pos="ADJ" hw="hopeful" c5="AJ0">hopeful </w>
<w pos="SUBST" hw="mutter" c5="NN1-VVB">mutter</w>
<c c5="PUN">, </c>
<w pos="CONJ" hw="but" c5="CJC">but </w>
<w pos="ADV" hw="more" c5="AV0">more </w>
<w pos="ADV" hw="importantly" c5="AV0">importantly </w>
<w pos="PREP" hw="with" c5="PRP">with </w>
<w pos="ART" hw="a" c5="AT0">a </w>
<w pos="ADJ" hw="curious" c5="AJ0">curious </w>
<w pos="SUBST" hw="facial" c5="NN1-AJ0">facial </w>
<w pos="SUBST" hw="expression" c5="NN1">expression </w>
<w pos="VERB" hw="mingle" c5="VVD-VVN">mingled </w>
<w pos="PREP" hw="between" c5="PRP">between </w>
<w pos="UNC" hw="erm" c5="UNC">erm </w>
<w pos="SUBST" hw="dread" c5="NN1">dread </w>
<w pos="CONJ" hw="and" c5="CJC">and </w>
<w pos="SUBST" hw="contempt" c5="NN1">contempt</w>
<c c5="PUN">, </c>
<w pos="SUBST" hw="sort" c5="NN1">sort </w>
<w pos="PREP" hw="of" c5="PRF">of </w>
<w pos="SUBST" hw="thing" c5="NN1">thing </w>
<w pos="PRON" hw="you" c5="PNP">you</w>
<w pos="VERB" hw="would" c5="VM0">'d </w>
<w pos="VERB" hw="expect" c5="VVI">expect </w>
-<mw c5="CJS">
<w pos="PREP" hw="as" c5="PRP">as </w>
<w pos="CONJ" hw="if" c5="CJS">if </w>
</mw>
<w pos="PRON" hw="you" c5="PNP">you</w>
<w pos="VERB" hw="have" c5="VHD">'d </w>
<w pos="VERB" hw="say" c5="VVN">said </w>
<w pos="PRON" hw="you" c5="PNP">you </w>
<w pos="VERB" hw="be" c5="VBD">were </w>
<w pos="ART" hw="a" c5="AT0">a </w>
<w pos="SUBST" hw="sorcerer" c5="NN1">sorcerer</w>
<c c5="PUN">.</c>
</s>
-<s n="3">
<vocal desc="laugh"/>
<w pos="PRON" hw="i" c5="PNP">I </w>
<w pos="VERB" hw="find" c5="VVB">find </w>
<w pos="PRON" hw="myself" c5="PNX">myself </w>
<w pos="ART" hw="the" c5="AT0">the </w>
<w pos="ADJ" hw="only" c5="AJ0">only </w>
<w pos="SUBST" hw="thing" c5="NN1">thing </w>
<w pos="VERB" hw="be" c5="VBZ">is </w>
<w pos="PREP" hw="to" c5="TO0">to </w>
<w pos="VERB" hw="change" c5="VVI">change </w>
<w pos="ART" hw="the" c5="AT0">the </w>
<w pos="SUBST" hw="subject" c5="NN1">subject</w>
<c c5="PUN">.</c>
</s>
-<s n="4">
<w pos="ADJ" hw="this" c5="DT0">This </w>
<w pos="UNC" hw="erm" c5="UNC">erm </w>
<w pos="SUBST" hw="reaction" c5="NN1">reaction </w>
<w pos="PREP" hw="to" c5="PRP">to </w>
<w pos="ART" hw="the" c5="AT0">the </w>
<w pos="SUBST" hw="disclosure" c5="NN1">disclosure </w>
<w pos="PRON" hw="i" c5="PNP">I </w>
<w pos="VERB" hw="think" c5="VVB">think</w>
<w pos="VERB" hw="be" c5="VBZ">'s </w>
<w pos="ADJ" hw="exaggerated" c5="AJ0-VVN">exaggerated </w>
<w pos="CONJ" hw="but" c5="CJC">but </w>
<w pos="PREP" hw="on" c5="PRP">on </w>
<w pos="ART" hw="the" c5="AT0">the </w>
<w pos="ADJ" hw="other" c5="AJ0">other </w>
<w pos="SUBST" hw="hand" c5="NN1">hand </w>
<w pos="PRON" hw="there" c5="EX0">there</w>
<w pos="VERB" hw="be" c5="VBZ">'s </w>
<w pos="PRON" hw="something" c5="PNI">something </w>
<w pos="PREP" hw="in" c5="PRP">in </w>
<w pos="PRON" hw="it" c5="PNP">it</w>
<c c5="PUN">.</c>
</s>
My current code for getting all the words after the target word 'has':
<html>
<body>
<table border='1'>
<tr><td>Target</td><td>Successor</td></tr>
{
for $targetword in (collection("./?select=*xml"))//s//w
where lower-case(normalize-space($targetword))="has"
let $successor := lower-case(normalize-space($targetword/following-sibling::w[1]))
return <tr><td>{data($targetword)}</td><td>{$successor}</td></tr>
}
</table>
</body>
</html>
Any help will be appreciated
I am using BaseX.
You need to add grouping to the FLWOR expresssion
XQuery
Output