Spliting an XML and remove empty attributes using xslt 2.0

1.1k views Asked by At

I want to remove empty attributes from xml and also need to split it based on particular element .I created two xsl's for splitting and removing empty attributes seperately , its working fine . But i need to combine these two xsl's such that after removing empty attributes the xml needs to be splitted based on particular element .

RemoveAttribute xslt :

 <xsl:template match="node()|@*">        

 <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
 </xsl:copy>
 </xsl:template> 

 <xsl:template match="@*[not(normalize-space(.))]">

  <xsl:if test="descendant::*/@*[not(normalize-space(.))]">
     <xsl:copy />
  </xsl:if>

 </xsl:template>
 </xsl:stylesheet>

Splitting XSLT :

 <?xml version="1.0" encoding="ISO-8859-1"?>
  <xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"        
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

   <xsl:output omit-xml-declaration="yes" indent="yes"/>



   <xsl:template match="/*" >


   <xsl:result-document href="ure.xml">
    <xsl:element name="Employee" >
        <xsl:attribute name="xsi:schemaLocation">sample.xsd</xsl:attribute>
        <xsl:copy-of select="/Employee/*"/>          
    </xsl:element>


</xsl:result-document>

     </xsl:template>


    </xsl:stylesheet>

Input XML :

  <?xml version="1.0" encoding="UTF-8"?>
  <Enroll>
   <Department id="x1" name="">
      <members id ="" name="lio">ds</members>
   </Department>
    <Employee>
    <address id="s1" no=""></address>
    <domain id="" no="34"></domain>
    </Employee>
  </Enroll> 

output_one xml :

   <Department id="x1" name="">
      <members id ="" name="lio">ds</members>
   </Department>

ouput+_two Xml :

   <Employee>
    <address id="s1" ></address>
    <domain  no="34"></domain>
   </Employee>

Output should be an two separate xml file which should have the splitted xml part and empty attributes needs to be removed .

I have tried it using Apply-templates , include and xml pipelining but i couldn't get it working .

Any help would be very much appreciated .

2

There are 2 answers

5
JLRishe On BEST ANSWER

This should accomplish what you are describing:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  exclude-result-prefixes="xsi">

  <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="Employee//@*[not(normalize-space())]" />

  <xsl:template match="/*" >

    <xsl:result-document href="output_one.xml">
      <xsl:apply-templates select="Department" />
    </xsl:result-document>

    <xsl:result-document href="output_two.xml">
      <xsl:apply-templates select="Employee" />
    </xsl:result-document>

  </xsl:template>

</xsl:stylesheet>

When run on your provided input, the result is:

output_one.xml:

<Department id="x1" name="">
    <members id="" name="lio">ds</members>
</Department>

output_two.xml:

<Employee>
  <address id="s1" />
  <domain no="34" />
</Employee>
0
grtjn On

I'll provide an XProc alternative since you tagged the question with XProc. The following preserves Employee elements:

<p:declare-step version="1.0" xmlns:p="http://www.w3.org/ns/xproc">
    <p:input port="source"/>
    <p:output port="result"/>

    <p:delete match="@*[normalize-space(.) = '']"/>
    <p:filter select="//Employee"/>
</p:declare-step>

You can execute it with XMLCalabash using a command-line like:

calabash --input source=in.xml --output result=employee.xml test.xpl

It does assume that there is only one Employee element in your input. Otherwise, it will try to write multiple Employee elements into a single file. It would first complain that the result output port doesn't accept sequences.

If you add sequence="true" to it, without any further changes, you will end up with non-well-formed XML, like with the XSLT approach from JLRishe. You would need to wrap the sequence of Employee elements in that case with p:wrap-sequence, or use p:for-each, and something like p:store to write the individual employees to disk..

Note: the last paragraph might be a bit terse if you are new to XProc. Let me know if I need to elaborate..

ADDED:

If you want to save both Department, and Employee elements with XProc, you can use the following:

<p:declare-step version="1.0" xmlns:p="http://www.w3.org/ns/xproc">

    <p:input port="source"/>
    <p:output port="employees">
        <p:pipe step="employees" port="result"/>
    </p:output>
    <p:output port="departments" primary="true"/>

    <p:delete match="@*[normalize-space(.) = '']" name="cleaned"/>

    <p:filter select="//Employee"/>
    <p:wrap-sequence wrapper="Employees" name="employees"/>

    <p:filter select="//Department">
        <p:input port="source">
            <p:pipe step="cleaned" port="result"/>
        </p:input>
    </p:filter>
    <p:wrap-sequence wrapper="Departments"/>

</p:declare-step>

You can execute it with XMLCalabash using a command-line like:

calabash --input source=in.xml --output employees=employees.xml --output departments.xml test2.xpl

The code still follows the same flow, but employees output port is non-primary, and has to be bound explicitly to the result of the employees step. The departments filtering uses an explicit input port binding to not take the result of 'employees' as input, but the 'cleaned' step instead. All other inputs and outputs are bound automatically based on conventions.

Note: I added p:wrap-sequence to make it more robust. You can remove those, provided you move the name attribute of employees from p:wrap-sequence to the p:filter in front of it..

HTH!