Using XSLT to transform XML into "Boolean" English sentence with nested AND/OR

197 views Asked by At

I need to turn XML into something resembling an English sentence. For example the following XML:

<event>
<criteria>
    <and>A</and>
    <and>B</and>
    <and>
        <or>
            <and>C</and>
            <and>D</and>
        </or>
        <or>E</or>
    </and>
</criteria>
</event>

Must be turned into something like :

To meet the criteria event must have A and B and either C and D or E.

This is one example, but the "and" and "or" conditons can nest further.

The rules seem to be:

  • if an element has no following siblings or children, then nothing is output and you are done.
  • if "and" or "or" have a following sibling with no children, then the type of the following sibling ("and" or "or") is output.(e.g., A and B; C and D; D or E)
  • If "and" has a following "and" sibling with an "or" child, then "and either" is output (e.g., and either C).
  • Elements with no text are not output.

I've tried a few approaches to generating this output, but haven't succeeded. One issue is not getting the recursion right. I've seen lots of example of xslt processing where one element is nested (e.g., and Item can be composed of other Items that are composed of other Items, etc.), but no examples where two element like "and" and "or" can be siblings and/or nested within each other. I've tried using xsl:template match= "and | or" and then testing for "and" or "or", but I'm either not getting down to the leaf level, or having things come out in the wrong order.

I'd like to know if anyone can point me in the right direction for processing a structure like this, and/or if anyone could suggest a better structure to represent the "Boolean" sentence. Since the XML is not yet finalized and can be modified if it would make processing easier.

Note: I'm using Saxon 9 and can use an xslt 2.0 solution.

More Info:

Thanks again to @g-ken-holman. I like the top down approach suggested, but I'm having some problems. I'm not sure why the and/or sequence was changed to or/and in Ken's example. The and/or sequence seems correct. Anyway, I ran the example and it worked. However, I have been given 5 cases in total. It worked for the first two simple cases with all and's or or's, and for case 5, which is the case above. But case 3 and 4 didn't work. Here is the XML and the results.

 <event>
<example>3</example>
<criteria>
    <or>
        <op>A</op>
        <op>B</op>
    </or>
    <and>
        <op>C</op>
    </and>
</criteria>
</event>

Result: To meet the criteria, event must have either A or B C
Expected: To meet the criteria, event must have either A or B and C

And example 4:

<event>
  <example>4</example>
  <criteria>
<and>
    <op>A</op>
    <op>B</op>
</and>
<and>
    <or>
        <op>C</op>
        <op>D</op>
        <op>E</op>
    </or>
</and>
  </criteria>
</event>

Result: To meet the criteria, event must have A and B C or D or E Expected: To meet the criteria, event must have A and B and either C or D or E

I think the reason is the and/or or is only being output if there is more than one (position()>1) test. But this will not cover all the cases. Maybe if position()>1 of node count = 1?

An "either" element could be added if that would make it easier.

Note On Answer:

This is too long for the comments section so I am adding it here. I believe @Ken has provided the answer and that the second approach he suggests is best.

If I understand the processing. We are matching all nodes in the document. We match on "event" and that executes first since it is nested outside the other nodes. Then, if an "and" node is encountered we get a match on "and" and we iterate (for-each) through all the "and" siblings at that level. We will not output the word "and" for the first node, since the test "position() > 1" fails. We always output a blank space using xls:text. Next we apply templates from the current (context) node (). This starts to walk us down the tree since we are now matching only on child nodes of the "and". If we match an "and" next we repeat what we did so far. If we match an "or" next, we do the match="or" template, which is almost identical to the "and" except it outputs the word "or". However, there are two possible templates that match on "or" and 1]" priority="1">. The priority="1" sets the priority of that match higher than the other "or" match because unless a priority is specified, a match has the default priority of 0.5. Therefore if the current "or" node has 2 children (or[count(*) > 1]), we output "either" and then invoke which will allow the lower priority "or" match to run.

I think this is correct, but I have one question. How does the text for the operands get put to the output?

2

There are 2 answers

2
G. Ken Holman On BEST ANSWER

I suggest you always approach your data "top-down" rather than try to deal with siblings.

Below is a solution:

t:\ftemp>type boolean1.xml 
<event>
<criteria>
    <and>A</and>
    <and>B</and>
    <and>
        <or>
            <and>C</and>
            <and>D</and>
        </or>
        <or>E</or>
    </and>
</criteria>
</event>
t:\ftemp>call xslt2 boolean1.xml boolean1.xsl 

To meet the criteria, event must have A and B and  either  C and D or E
t:\ftemp>type boolean1.xsl 
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="2.0">

<xsl:output method="text"/>

<!--eat white-space-->
<xsl:template match="text()[not(normalize-space())]"/>

<!--start result-->
<xsl:template match="event">
To meet the criteria, event must have<xsl:apply-templates/>
</xsl:template>

<!--handle conjunction-->
<xsl:template match="*[child::and]">
  <xsl:for-each select="child::and">
    <xsl:if test="position()>1"> and</xsl:if>
    <xsl:text> </xsl:text>
    <xsl:apply-templates select="."/>
  </xsl:for-each>
</xsl:template>

<!--handle alternation-->
<xsl:template match="*[child::or]">
  <xsl:for-each select="child::or">
    <xsl:if test="position()>1"> or</xsl:if>
    <xsl:text> </xsl:text>
    <xsl:apply-templates select="."/>
  </xsl:for-each>
</xsl:template>

<!--special grammar case for alternation between 2 operands-->
<xsl:template match="*[count(child::or) = 2]" priority="1">
  <xsl:text> either</xsl:text>
  <xsl:next-match/>
</xsl:template>

<!--don't allow a mixture-->
<xsl:template match="*[child::and and child::or]" priority="2">
  <xsl:message terminate="yes">
    <xsl:text>A mixture of ands and ors is not allowed.</xsl:text>
  </xsl:message>
</xsl:template>

</xsl:stylesheet>
t:\ftemp>rem Done! 

As for suggestions for changing your XML, I suggest using a structure that doesn't allow for unexpected combinations, such as "what to do when both ands and ors are siblings". Consider the following:

t:\ftemp>type boolean2.xml 
<event>
<criteria>
  <and>
    <op>A</op>
    <op>B</op>
    <or>
      <and>
        <op>C</op>
        <op>D</op>
      </and>
      <op>E</op>
    </or>
  </and>
</criteria>
</event>
t:\ftemp>call xslt2 boolean2.xml boolean2.xsl 

To meet the criteria, event must have A and B and  either  C and D or E
t:\ftemp>type boolean2.xsl 
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="2.0">

<xsl:output method="text"/>

<!--eat white-space-->
<xsl:template match="text()[not(normalize-space())]"/>

<!--start result-->
<xsl:template match="event">
To meet the criteria, event must have<xsl:apply-templates/>
</xsl:template>

<!--handle conjunction-->
<xsl:template match="and">
  <xsl:for-each select="*">
    <xsl:if test="position()>1"> and</xsl:if>
    <xsl:text> </xsl:text>
    <xsl:apply-templates select="."/>
  </xsl:for-each>
</xsl:template>

<!--handle alternation-->
<xsl:template match="or">
  <xsl:for-each select="*">
    <xsl:if test="position()>1"> or</xsl:if>
    <xsl:text> </xsl:text>
    <xsl:apply-templates select="."/>
  </xsl:for-each>
</xsl:template>

<!--special grammar case for alternation between 2 operands-->
<xsl:template match="or[count(*) = 2]" priority="1">
  <xsl:text> either</xsl:text>
  <xsl:next-match/>
</xsl:template>

</xsl:stylesheet>
t:\ftemp>rem Done! 

In this second approach, the "action" is triggered by the element, not by the children operand elements. I think this would be more direct.

Note that for the English reader there may be some grammatical challenges when nesting ands and ors deeply without some punctuation somewhere.

0
G. Ken Holman On

This alternate answer has the same stylesheet logic (with the only change being the exposition of the example number), but is posted to address the edited question for examples 3 and 4.

Where you have:

<event>
<example>3</example>
<criteria>
    <or>
        <op>A</op>
        <op>B</op>
    </or>
    <and>
        <op>C</op>
    </and>
</criteria>
</event>

I would have written the same as the following, which gives you the result you want using my original logic:

t:\ftemp>type boolean3.xml
<event>
<example>3</example>
<criteria>
  <and>
    <or>
        <op>A</op>
        <op>B</op>
    </or>
    <op>C</op>
  </and>
</criteria>
</event>
t:\ftemp>xslt2 boolean3.xml boolean2.xsl
3 To meet the criteria, event must have  either A or B and C

Similarly for example 4, where you have:

<event>
  <example>4</example>
  <criteria>
<and>
    <op>A</op>
    <op>B</op>
</and>
<and>
    <or>
        <op>C</op>
        <op>D</op>
        <op>E</op>
    </or>
</and>
  </criteria>
</event>

I would have written it as follows:

t:\ftemp>type boolean4.xml
<event>
  <example>4</example>
  <criteria>
<and>
    <op>A</op>
    <op>B</op>
    <or>
        <op>C</op>
        <op>D</op>
        <op>E</op>
    </or>
</and>
  </criteria>
</event>
t:\ftemp>xslt2 boolean4.xml boolean2.xsl
4 To meet the criteria, event must have A and B and  C or D or E

In my code I only used the word "either" when there were two or operands ... I suppose it also works when there are more than two operands, so you would add that to the or handling logic.

Here is the stylesheet modified to accommodate the example number:

t:\ftemp>type boolean2.xsl
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="2.0">

<xsl:output method="text"/>

<!--eat white-space-->
<xsl:template match="text()[not(normalize-space())]"/>

<!--start result-->
<xsl:template match="event">
  <xsl:value-of select="example"/>
  <xsl:text> To meet the criteria, event must have</xsl:text>
  <xsl:apply-templates select="criteria"/>
</xsl:template>

<!--handle conjunction-->
<xsl:template match="and">
  <xsl:for-each select="*">
    <xsl:if test="position()>1"> and</xsl:if>
    <xsl:text> </xsl:text>
    <xsl:apply-templates select="."/>
  </xsl:for-each>
</xsl:template>

<!--handle alternation-->
<xsl:template match="or">
  <xsl:for-each select="*">
    <xsl:if test="position()>1"> or</xsl:if>
    <xsl:text> </xsl:text>
    <xsl:apply-templates select="."/>
  </xsl:for-each>
</xsl:template>

<!--special grammar case for alternation between 2 operands-->
<xsl:template match="or[count(*) = 2]" priority="1">
  <xsl:text> either</xsl:text>
  <xsl:next-match/>
</xsl:template>

</xsl:stylesheet>
t:\ftemp>

So, it all depends on how you write the XML. Check how I re-wrote what you did into how the operands work, and ask if you need more clarification.