Cannot get all matched nodes while using htmlparser to parse a website

Question

Cannot get all matched nodes while using htmlparser to parse a website

102 views Asked by user3115708 At 22 December 2013 at 07:43

I'm using htmlparser for parsing a website, but I've trapped in a really weird problem. I'm trying to get all <li> nodes at a webpage and my code is such as:

String url = "http://s.1688.com/selloffer/offer_search.htm?keywords=%BD%A8%B2%C4&n=y&categoryId=";
Parser parser = new Parser(url);
parser.setEncoding("gb2312");

NodeList list = parser.extractAllNodesThatMatch(new TagNameFilter("li"));
// NodeList list = parser.parse(new CssSelectorNodeFilter("li[class=\"sm-offerShopwindow\"]"));
System.out.print(list.size() + "\n");
for (int i = 0; i < list.size(); i++) {
Node li = list.elementAt(i);
System.out.print("text:" + li.getText() + "\n");
}

But the output of list size is always 20. It seems that it doesn't travel all nodes on that page. Why? Thanks for any advices.

Original Q&A

There are 1 answers

**Harald** · Answer 1 · 2013-12-22T08:15:20+00:00

Harald On 22 December 2013 at 08:15

Even the top browsers around do not always agree on how to parse all that weird stuff out there pretending to be HTML and, the web very much developed since 2006. So I would not be surprised if such an old piece of software cannot cope with modern HTML.

TechQA.

Cannot get all matched nodes while using htmlparser to parse a website

There are 1 answers

Related Questions in JAVA

Related Questions in HTML-PARSER

Popular Questions

Popular Tags

Trending Questions