I have a large number of html files like the following 01.html file:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>My Title</title>
</head>
<body>
<item itemprop="itemprop1" content="content1" />
<item itemprop="itemprop2" content="content2" />
<item itemprop="itemprop3" content="content3" />
<item itemprop="itemprop4" content="content4" />
<item itemprop="itemprop5" content="content5" />
<item itemprop="itemprop6" content="content6" />
<item itemprop="itemprop7" content="content7" />
<item itemprop="itemprop8" content="content8" />
<item itemprop="itemprop9" content="content9" />
</body>
</html>
There is only one item node with itemprop="itemprop1" in each html file. Same for itemprop2, itemprop3, etc.
I would like to have the following txt file output:
content1 | content 5
that is the concatenation of: 1. the value of the attribute content for the item with itemprop="itemprop1" 2. a pipe "|" 3. the value of the attribute content for the item with itemprop="itemprop5"
I run the following bash script:
xsltproc 01.xslt 01.html >> 02.txt
where 01.xslt is the following:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="body">
<xsl:value-of select="//item[@itemprop='itemprop1']/@content"/>|<xsl:value-of select="item[@itemprop='itemprop5']/@content"/>
</xsl:template>
</xsl:stylesheet>
Unfortunately it doesn't work. What is the correct xslt file?
UPDATE
This is the final working example.
01.html is the following:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>My Title</title>
</head>
<body>
<item itemprop="itemprop1" content="content1" />
<item itemprop="itemprop2" content="content2" />
<item itemprop="itemprop3" content="content3" />
<item itemprop="itemprop4" content="content4" />
<item itemprop="itemprop5" content="content5" />
<item itemprop="itemprop6" content="content6" />
<item itemprop="itemprop7" content="content7" />
<item itemprop="itemprop8" content="content8" />
<item itemprop="itemprop9" content="content9" />
</body>
</html>
01.xslt is the following:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes" method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="html">
<xsl:value-of select="//item[@itemprop='itemprop1']/@content"/>
<xsl:text>|</xsl:text>
<xsl:value-of select="//item[@itemprop='itemprop5']/@content"/>
</xsl:template>
</xsl:stylesheet>
and the output 02.txt is the following:
content1|content5
Actually, XSTL processes XML files, not HTML.
Your source HTML almost meets requirements of well-formed XML. There is only one error: Your
meta
element is not closed, so I changed it to:(adding
/
before the closing>
). Otherwise the XSLT processor displays an error message (at least in my installation).As far as your XSLT is concerned, I made a few corrections:
match="body"
changed tomatch="html"
,//
in the secondxsl:value-of
,|
to<xsl:text>|</xsl:text>
, only for readability reason (longer lines can not be seen on smaller monitors),<xsl:output method="text"/>
as your output does not seem to be any XML.Last 2 changes are optional, you can ignore them.
So the whole script can be like below: