How to define the loop xpath in Pentaho

38 views Asked by At
<?xml version="1.0" encoding="ISO-8859-1"?>
<ONIXmessage
    xmlns="http://www.editeur.org/onix/3.0/short/onix-international.dtd" release="3.0">
    <header>
        <sender>
            <x298>Ingram Book Company</x298>
            <x299>Andy Stone</x299>
            <j272>[email protected]</j272>
        </sender>
        <x307>20230924</x307>
        <m184>eng</m184>
        <x310>01</x310>
        <m186>USD</m186>
    </header>
    <product>
        <a001>015864289</a001>
        <a002>03</a002>
        <productidentifier>
            <b221>02</b221>
            <b244>000200853X</b244>
        </productidentifier>
        <descriptivedetail>
            <x314>00</x314>
            <b012>BC</b012>
            <b333>B102</b333>
            <b333>B305</b333>
            <measure>
                <x315>01</x315>
                <c094>8.50</c094>
                <c095>in</c095>
            </measure>
            <measure>
                <x315>02</x315>
                <c094>5.50</c094>
                <c095>in</c095>
            </measure>
            <measure>
                <x315>03</x315>
                <c094>0.45</c094>
                <c095>in</c095>
            </measure>
            <measure>
                <x315>08</x315>
                <c094>0.4200</c094>
                <c095>lb</c095>
            </measure>
            <x316>CA</x316>
        </descriptivedetail>
    </product>
</ONIXmessage>

I need to load this xml file into Pentaho and convert this into csv format. Measure is having multiple loop and its considering only first loop. I need column name like x315, x315_1, x315_2,x315_3,x315_4...

1

There are 1 answers

0
Ana GH On

OK, following this blog entry: https://dwbi.org/pages/169, you'll probably need to read the file several times in chunks, and afterwards merge the streams to generate only one file.

So you read the attributes that constitute an unique file (content of header, and most of product attributes in your example: a001, a002, b221...) and at the same time, in a separate stream, you read the attributes in measure, and some attributes above that constituting an unique identifier, in my example, I simply read the x314 attribute again to use it to merge afterwards, but with your real file you'll probably need to add more information or use different attributes as unique key for the join.

So my transformation looks like this: Transformation steps To get this result you can put in a CSV file: |x298|x299|...|x315|c094|c095| |---|---|---|---|---|---| |Ingram|Andy Stone|...|01|8.50|in| |Ingram|Andy Stone|...|02|5.50|in| |Ingram|Andy Stone|...|03|0.45|in| |Ingram|Andy Stone|...|04|0.4200|lb|

So how do you configure the Get data from XML step in the Measure_content? In the Content tab when you configure the step, in the Loop XPath configuration, you ask to Get XPath Nodes and select the "measure" path: Ask for XPath Nodes Select measure path

Once you have the path selected, you go to the next tab, Fields, and apart of getting the attributes in the measure node, you add manually the x314 field and put the path manually, going up one level in the selected node (../*[name()='x314']): Adding the x314 field