To test a method that would transform Text elements in an XML document I wrote two very simple Selectors and applied map/toUpperCase on the resulting Zipper. The result should be that all text elements except those excluded via the first Selector are transformed to upper case. But it only works for furthest-down Text elements. Here's the code:
scala> import com.codecommit.antixml._
import com.codecommit.antixml._
scala> val elemSelector = Selector({case x:Elem if x.name != "note" => x})
elemSelector: com.codecommit.antixml.Selector[com.codecommit.antixml.Elem] = <function1>
scala> val textSelector = Selector({case x:Text => x})
textSelector: com.codecommit.antixml.Selector[com.codecommit.antixml.Text] = <function1>
scala> val xml = XML.fromString("<tei><div><p>this<note>not<foreign lang=\"greek\">that</foreign>not</note></p><p>those<hi>these</hi></p></div></tei>")
xml: com.codecommit.antixml.Elem = <tei><div><p>this<note>not<foreign lang="greek">that</foreign>not</note></p><p>those<hi>these</hi></p></div></tei>
scala> val zipper = xml \\ elemSelector \ textSelector
zipper: com.codecommit.antixml.Zipper[com.codecommit.antixml.Text] = thisthatthosethese
scala> val modified = zipper.map(t => new Text(t.text.toUpperCase))
modified: com.codecommit.antixml.Zipper[com.codecommit.antixml.Text] = THISTHATTHOSETHESE
scala> val result = modified.unselect.unselect
result: com.codecommit.antixml.Zipper[com.codecommit.antixml.Node] = <tei><div><p>this<note>not<foreign lang="greek">THAT</foreign>not</note></p><p>those<hi>THESE</hi></p></div></tei>
So, in the second to last command, the upper case is applied to all targeted Text elements, but after stepping out of the zipper, only two of the four elements are transformed. I've tried it with <hi/> instead of <hi>these</hi> and then those gets capitalized. Any idea what's the problem here?
I am using the arktekk.no fork for Scala 2.10.3.
The problem you have comes from a merge conflict in the unselection process.
Just to simplify your problem a little bit, I'll use the following data:
When you select all the elements in the tree with the
*selector you get theaandbElems in your results set. The second shallow selector looks only at direct children of eitheraorband takes theTextvalues. So we getfoofromaandbarfromb.After the modification the first
unselectcontains the individualElems with their updates:Now the next
unselectneeds to mergebback intoato form a new version ofa. The current version ofbimplies a newasuch that:And there's your conflict, you can either have
awith the childrenList(FOO, <b>bar</b>)or with the childrenList(foo, <b>BAR</b>). As there is no generic way to determine which list is better (they were both updated at the same time), the selection is implementation dependent. In this case, it takes the modification that came from the deeper level in the tree.You can solve this by not selecting
Elems and modifying theTextnodes directly, thus avoiding any possible conflicts (as they can only occur onElems). So you write:If that's not an option for your use case, it may be possible to define a custom merging strategy for
unselectto use for this specific case; one which will manage to somehow disambiguate the different parts of the children lists. Even if possible, I doubt that it'll be worth the effort.