I've been working on CSV to XML conversions using this style-sheet: XSLT 2.0 to convert CSV to XML format
I needed to account for both comma and pipe delimiters so I changed the style sheet to this:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="csv-uri" as="xs:string" select="'file:///c:/test.csv'"/>
<xsl:template match="/" name="csv2xml">
<Entity>
<Rows>
<xsl:choose>
<xsl:when test="unparsed-text-available($csv-uri)">
<xsl:variable name="csv" select="unparsed-text($csv-uri)" />
<xsl:variable name="pipe" select="'\|'"/>
<xsl:analyze-string select="replace($csv,$pipe,',')" regex='\r\n?|\n' >
<xsl:non-matching-substring>
<xsl:if test="not(position()=0)" >
<Row>
<xsl:for-each select="tokenize(.,',')" >
<xsl:element name="Column_{position()}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
</Row>
</xsl:if>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="error">
<xsl:text>Error reading file: "</xsl:text>
<xsl:value-of select="$csv-uri"/>
</xsl:variable>
<xsl:message>
<xsl:value-of select="$error"/>
</xsl:message>
<xsl:value-of select="$error"/>
</xsl:otherwise>
</xsl:choose>
</Rows>
</Entity>
</xsl:template>
</xsl:stylesheet>
However, I've been having some difficulty accounting for quoted values from the input.
Currently if a row in the csv is:
1,2,3,4,5,"testing,1,1,1"
red,white,blue,green
dogs|cats|rabbits
the "testing,1,1,1" gets split into 4 columns into the CSV instead of one column.
output:
<?xml version="1.0" encoding="UTF-8"?>
<Entity>
<Rows>
<Row>
<Column_1>1</Column_1>
<Column_2>2</Column_2>
<Column_3>3</Column_3>
<Column_4>4</Column_4>
<Column_5>5</Column_5>
<Column_6>"testing</Column_6>
<Column_7>1</Column_7>
<Column_8>1</Column_8>
<Column_9>1"</Column_9>
</Row>
<Row>
<Column_1>red</Column_1>
<Column_2>white</Column_2>
<Column_3>blue</Column_3>
<Column_4>green</Column_4>
</Row>
<Row>
<Column_1>dogs</Column_1>
<Column_2>cats</Column_2>
<Column_3>rabbits</Column_3>
</Row>
</Rows>
</Entity>
I've done some research and using regex='("[^"]*")+' can accomplish this. But I'm not entirely sure how to implement this without removing something I have that I need (maybe probably in the Analyze-string block?). I need help please! Its probably something simple, so please school me or point me in the right direction. Any advice would be helpful.
You can just add another
xsl:analyze-string
to process thexsl:non-matching-substring
from the firstxsl:analyze-string
...CSV Input
Modified XSLT 2.0
XML Output
The regex from the second
xsl:analyze-string
is using captured substrings to ignore the quotes and the comma. Here's an easier to read version:If you want to keep the quotes, move them inside the parens: