Parser parser = new Parser();
parser.setInputHTML("d:/index.html");
parser.setEncoding("UTF-8");
NodeList nl = parser.parse(null);
/*
SimpleNodeIterator sNI=list.elements();
while(sNI.hasMoreNodes()){
System.out.println(sNI.nextNode().getText());}
*/
NodeList trs = nl.extractAllNodesThatMatch(new TagNameFilter("tr"),true);
for(int i=0;i<trs.size();i++) {
NodeList nodes = trs.elementAt(i).getChildren();
NodeList tds = nodes.extractAllNodesThatMatch(new TagNameFilter("td"),true);
System.out.println(tds.toString());
I am not getting any output, eclipse shows javaw.exe terminated.
Pass the path to the resource into the constructor.
Parse and print all the divs on this page:
parser.setInputHtml(String inputHtml)
doesn't do what you think it does. It treatsinputHtml
as the html input to the parser. You use the constructor to point the parser at an html resource (file
orURL
).Example: