I am new to the coding world, and I am stuck on an issue for quite some time now.
I want to convert an excel file to an XML for certain analysis. The XML file however has a complex schema, and I am unable to replicate the Sample XML.
I have tried using Python with some of it's libraries, but I am unable to do so.
This is the data in XLSX file:
and the expected XML output is as follows:
<REP019-GIValueMeasuresReport
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="urn:fsa-gov-uk:MER:REP019:1"
xsi:schemaLocation="urn:XXXXXXX:REP019:1
http://XXXXXXXXX/REP019/v1/REP019-Schema.xsd"
currency = "GBP"
units= "single">
<Declaration>
<Return>no</Return>
</Declaration>
<ReturnDetails>
<Consent>yes</Consent>
<Informed>yes</Informed>
</ReturnDetails>
<ProductReport>
<Product>After the event legal expenses (All)</Product>
<FiveLargest>Intermediary Z</FiveLargest>
<FiveLargest>Intermediary X</FiveLargest>
<FiveLargest>Intermediary Y</FiveLargest>
<FiveLargest>Brand A</FiveLargest>
<FiveLargest>Brand B</FiveLargest>
<Sales>5000</Sales>
<CostToState>500000</CostToState>
<NumberofClaims>375</NumberofClaims>
<Average>6000.00</Average>
<Frequency>6.25</Frequency>
<NoofParts>320</NoofParts>
<Nulled>25</Nulled>
<Accepted>93.33</Accepted>
<Paid>32000</Paid>
<Avg_C>100</Avg_C>
<Highest>1200</Highest>
<MEH>25</MEH>
<Complaints>5</Complaints>
<Percentage>1.33</Percentage>
</ProductReport>
<ProductReport>
<Product>Alloy Wheel insurance (Add-on)</Product>
<FiveLargest>Intermediary Z</FiveLargest>
<FiveLargest>Intermediary X</FiveLargest>
<FiveLargest>Brand A</FiveLargest>
<Sales>17000</Sales>
<CostToState>765000</CostToState>
<NumberofClaims>340</NumberofClaims>
<Average>16400.00</Average>
<Frequency>2.07</Frequency>
<NoofParts>280</NoofParts>
<Nulled>78</Nulled>
<Accepted>77.06</Accepted>
<Paid>39200</Paid>
<Avg_C>140</Avg_C>
<Highest>2000</Highest>
<MEH>50</MEH>
<Complaints>20</Complaints>
<Percentage>5.88</Percentage>
</ProductReport>
<ProductReport>
<Product>Alloy Wheel insurance (Stand-alone)</Product>
<FiveLargest>Brand A</FiveLargest>
<Sales>100000</Sales>
<CostToState>7000000</CostToState>
<NumberofClaims>3500</NumberofClaims>
<Average>87000.00</Average>
<Frequency>4.02</Frequency>
<NoofParts>3500</NoofParts>
<Nulled>100</Nulled>
<Accepted>97.14</Accepted>
<Paid>2275000</Paid>
<Avg_C>650</Avg_C>
<Highest>10500</Highest>
<MEH>325</MEH>
<Complaints>170</Complaints>
<Percentage>4.86</Percentage>
</ProductReport>
<ProductReport>
<Product>Before the event legal expenses - home (All)</Product>
<FiveLargest>Intermediary Z</FiveLargest>
<FiveLargest>Intermediary X</FiveLargest>
<FiveLargest>Intermediary Y</FiveLargest>
<FiveLargest>Brand A</FiveLargest>
<FiveLargest>Brand B</FiveLargest>
<Sales>5000</Sales>
<CostToState>500000</CostToState>
<NumberofClaims>375</NumberofClaims>
<Average>6000.00</Average>
<Frequency>6.25</Frequency>
<NoofParts>320</NoofParts>
<Nulled>25</Nulled>
<Accepted>93.33</Accepted>
<Paid>32000</Paid>
<Avg_C>100</Avg_C>
<Highest>1200</Highest>
<MEH>25</MEH>
<Complaints>5</Complaints>
<Percentage>1.33</Percentage>
</ProductReport>
<ProductReport>
<Product>Before the event legal expenses - motor (All)</Product>
<FiveLargest>Intermediary Z</FiveLargest>
<FiveLargest>Intermediary X</FiveLargest>
<FiveLargest>Brand A</FiveLargest>
<Sales>17000</Sales>
<CostToState>765000</CostToState>
<NumberofClaims>340</NumberofClaims>
<Average>16400.00</Average>
<Frequency>2.07</Frequency>
<NoofParts>280</NoofParts>
<Nulled>78</Nulled>
<Accepted>77.06</Accepted>
<Paid>39200</Paid>
<Avg_C>140</Avg_C>
<Highest>2000</Highest>
<MEH>50</MEH>
<Complaints>20</Complaints>
<Percentage>5.88</Percentage>
</ProductReport>
<ProductReport>
<Product>Before the event legal expenses - other (All)</Product>
<FiveLargest>Brand A</FiveLargest>
<Sales>100000</Sales>
<CostToState>7000000</CostToState>
<NumberofClaims>3500</NumberofClaims>
<Average>87000.00</Average>
<Frequency>4.02</Frequency>
<NoofParts>3500</NoofParts>
<Nulled>100</Nulled>
<Accepted>97.14</Accepted>
<Paid>2275000</Paid>
<Avg_C>650</Avg_C>
<Highest>10500</Highest>
<MEH>325</MEH>
<Complaints>170</Complaints>
<Percentage>4.86</Percentage>
</ProductReport>
</REP019-GIValueMeasuresReport>
Should be easy enough, read a tutorial like the GeekforGeeks on doing such a conversion,
https://www.geeksforgeeks.org/how-to-convert-excel-to-xml-format-in-python/
Since currency, units and the <Declaration> and <ReturnDetails> tags are common to all in the xml file and are all the same in the spreadsheet, I presume these can be taken from any row so no need to iterate for those just select a cell to use, like
Currency and units can be added to the 'xml_schema =' text along with the rest of your schema text formatted as you like. <Declaration> and <ReturnDetails> can be achieved using the 'with tag' syntax without the iterations, e.g. this for <ReturnDetails>.
Do same for <Declaration> pointing to the correct cell value. Then for each of the rest of the rows follow the example in the tutorial to iterate (from columns 6 to 21) using outer tag <ProductReport> and inner tags per the column headers.
The part I'm not sure about is you show in the example xml the first <ProductReport> with 5 <FiveLargest> tags, the second has 3 etc. Do not know how this is determined as each row appears to have only one entry for this column, are there are multiple lines in these cells? hidden in the screen shot
If that is the case you'll need read the cell contents, split on the new line and iterate the list for the same tag name.
- -----------------Add Code example---------------
Below is an example of how to code this. If the <FiveLargest> are multilined this code will manage that.