how to parse the html when using crawler4j

Question

how to parse the html when using crawler4j

2.4k views Asked by mly At 05 September 2013 at 14:18

Recently,I had to crawl some website with open Source project crawler4j.However,crawler4j didn't offer any api for using.Now,i came to a problem that how i can parse a html with the function and class provided by crawler4j and find element like we do with jquery

Original Q&A

There are 1 answers

**vigneshwerv** · Answer 1 · 2013-09-16T06:56:21+00:00

It's relatively simple. The following approach worked for me.

In MyCrawler.java:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
...
public void visit(Page page) {
...
if (page.getParseData() instanceof HtmlParseData) {
                    HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
                    String html = htmlParseData.getHtml();
                    Document doc = Jsoup.parseBodyFragment(html);
...

TechQA.

how to parse the html when using crawler4j

There are 1 answers

Related Questions in JAVA

Related Questions in CRAWLER4J

Popular Questions

Popular Tags

Trending Questions