I would like to grab the information of each child link, but the program shows error. Below are my full config file. The error is Caused by: org.xml.sax.SAXParseException; lineNumber: 6; columnNumber: 724; Element type "t.length" must be followed by either attribute specifications, ">" or "/>".**
<?xml version="1.0" encoding="UTF-8"?>
<config>
<var-def name="webpage">
<html-to-xml>
<http url="http://www.thestar.com.my/business/" />
</html-to-xml>
</var-def>
<loop item="TheStarBiz" index="i">
<list>
<xpath expression="//div[@class='nine columns mobile3']">
<var name="webpage"></var>
</xpath>
</list>
<body>
<var-def name="title">
<xpath expression="(//p[@class='m'])/a/text()">
<var name="TheStarBiz"></var>
</xpath>
</var-def>
<var-def name="link">
<xpath expression="//p[@class='m']/a/@href">
<var name="TheStarBiz"></var>
</xpath>
</var-def>
<var-def name="new_url">
<xquery>
<xq-param name="TheStarBiz"><var name="TheStarBiz"/></xq-param>
<xq-expression><![CDATA[
declare variable $TheStarBiz as node() external;
let $url := data($TheStarBiz//p[@class='m']/a/@href)
return
$url
]]></xq-expression>
</xquery>
</var-def>
<var-def name="new_page_content">
<http url="${new_url}"/>
</var-def>
<var-def name="fulldesc">
<xpath expression="//div[@class='story']">
<var name="new_page_content"/>
</xpath>
</var-def>
<var-def name="textfile">
<file action="append" type="text" path="C:\Users\jacey\Desktop\WebHarvest\test.txt">
<template>
${title} ${sys.cr}${sys.lf}
${link} ${sys.cr}${sys.lf}
${new_page_content} ${sys.cr}${sys.lf}
</template>
</file>
</var-def>
</body>
</loop>
</config>
For those who come after:
I had almost the same error and it was caused by a snippet of javascript in the file being parsed:
In hindsight I suppose it's kindof obvious. In our case, this was because the endpoint was no longer returning XML but HTML. If the desired file actually has javascript, you might add CDATA tags around your js like so: