xQuery Why am i still getting spaces when i use the normalize-space function?

209 views Asked by At

I am writing code that looks through an XML file and gets a target word. It then looks for a successor word and calculates the probability of those 2 words showing up together in all of the documents. When i try to normalize-space(), the results in the output for $successor still shows a space after the word. Below is my code and the output file I get.

Code:

<html>
<body>
<table border='1'>
<tr><td>Target</td><td>Successor</td><td>Probability</td></tr>
{
let $targetword := "has"
let $t_word_occ := collection("./?select=*xml")//s//w[lower-case(normalize-space()) = $targetword] (::)
let $totalwords := collection("./?select=*xml")//s//w[lower-case(normalize-space())]
for $successor in distinct-values($t_word_occ/following-sibling::w[1])
    let $freq := count($t_word_occ/following-sibling::w[1][. = $successor])
        let $dwtw := count($totalwords[. = $successor])
let $prob := $freq div $dwtw
order by ($prob) descending
return <tr><td>{$targetword}</td><td>{$successor}</td><td>{$prob}</td>
       </tr>
}
</table>
</body>
</html>

Sample output:

 <tr>
            <td>Target</td>
            <td>Successor</td>
            <td>Probability</td>
         </tr>
         <tr>
            <td>has</td>
            <td>intentions </td>
            <td>1</td>
         </tr>
         <tr>
            <td>has</td>
            <td>drifted </td>
            <td>1</td>
         </tr>
         <tr>
            <td>has</td>
            <td>eluded </td>
            <td>1</td>
         </tr>
         <tr>
            <td>has</td>
            <td>won</td>
            <td>1</td>
         </tr>

In the output you can see for some words it says for example, "drifted ", "eluded " with the space after. And one which is normal e.g. "won" (without the space)

How would I go about fixing this?

I am also using xQuery 1.0

1

There are 1 answers

8
Yitzhak Khabinsky On

You can try the following technique:

<td>{$successor cast as xs:token?}</td>

for $successor in distinct-values($t_word_occ/(following-sibling::w[1] cast as xs:token?))

Or even as follows

for $successor in distinct-values($t_word_occ/xs:token(following-sibling::w[1]))

A full repro

xquery version "1.0";

declare context item := document {
<root>
     <column id="1" isok="true">OK</column>
     <column id="2" isok="false">NOT OK</column>
     <column id="3" isok="   TRUE   ">OK</column>
     <column id="4" isok=" false">NOT OK</column>
 </root>
};

<root>
{
  (: two versions, both working :)
  (: for $x in distinct-values(./root/column/(@isok cast as xs:token?)) :)
  for $x in distinct-values(./root/column/xs:token(@isok))
  return <r>{$x}</r>
}
</root>