muenchian grouping

1.3k views Asked by At

I was wondering how this predicate([1]), is hardcoded as 1 always in the muenchian grouping. The concept was not clear for me, after a lot of search. It is explained as the current node, is compared with the 1st group returned by the key. Why does it always compare with the first one that a key is matched? Also why are we giving contact[count(. | key('contacts-by-surname', surname)[1]) = 1], the =1 part? again 1 is hardcoded. I referred the below link

http://www.jenitennison.com/xslt/grouping/muenchian.html

3

There are 3 answers

3
Dimitre Novatchev On BEST ANSWER

I was wondering how this predicate([1]), is hardcoded as 1 always in the muenchian grouping.

This is simple:

The key() function produces all nodes for a given group, and we want to take just one node from any group.

It isn't guaranteed that all groups will have two or more nodes in them -- some might have just one node.

This is why it is safe and convenient to take the first (and possibly the only) node from each group.

We could equally well do the grouping taking the last node from each group (but this will be less efficient):

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kNumByMod3" match="num"
  use=". mod 3"/>

 <xsl:template match=
  "num[generate-id()
      =
       generate-id(key('kNumByMod3', . mod 3)[last()])
      ]
  ">


  3k + <xsl:value-of select=". mod 3"/>:
<xsl:text/>
  <xsl:copy-of select="key('kNumByMod3', . mod 3)"/>
 </xsl:template>
 <xsl:template match="text()"/>
</xsl:stylesheet>

when applied on this XML document:

<nums>
  <num>01</num>
  <num>02</num>
  <num>03</num>
  <num>04</num>
  <num>05</num>
  <num>06</num>
  <num>07</num>
  <num>08</num>
  <num>09</num>
  <num>10</num>
</nums>

produces the wanted, correctly grouped result:

  3k + 2:
<num>02</num>
<num>05</num>
<num>08</num>


  3k + 0:
<num>03</num>
<num>06</num>
<num>09</num>


  3k + 1:
<num>01</num>
<num>04</num>
<num>07</num>
<num>10</num>
1
Michael Kay On

The basic algorithm is that there are two nested loops. The outer loop selects one representative node from each group, and the inner loop selects all the nodes in that group (including the one chosen as representative). The easiest way of selecting one representative node from a group is to select the first, hence the predicate [1].

0
Martin Honnen On

Let's say we have a key definition <xsl:key name="contacts-by-surname" match="contact" use="surname"/>, then the expression key('contacts-by-surname', 'Doe') gives you a node set with all contact elements where the surname is Doe. The expression key('contacts-by-surname', 'Doe')[1] gives you the first contact in that "group".

Now when processing all contact elements with for-each or apply-templates we usually want a way to identify the first contact element in each group. This can be achieved with <xsl:for-each select="contact[count(. | key('contacts-by-surname', surname)[1]) = 1]"> or <xsl:for-each select="contact[generate-id() = generate-id(key('contacts-by-surname', surname)[1])]">.

If your requirement is different and you for instance wanted to identify the last item in each group then you could of course use a different predicate, as in <xsl:for-each select="contact[count(. | key('contacts-by-surname', surname)[last()]) = 1]"> or <xsl:for-each select="contact[generate-id() = generate-id(key('contacts-by-surname', surname)[last()])]">.