How to download text contained in JavaScript files via crawler4j?

Question

How to download text contained in JavaScript files via crawler4j?

389 views Asked by aardwolf At 16 June 2015 at 00:23

I'm trying to use crawler4j to extract text from some websites. However, while I have changed the Filters to allow extensions with js in the following manner

 private final static Pattern FILTERS = Pattern.compile(".*(\\.(css|gif|jpg"
        + "|png|mp3|mp3|zip|gz))$");

I do not know how to store this text to a file (if there is a different method of doing so for text in js files as opposed to regular text)

Original Q&A

There are 1 answers

**rzo1** · Answer 1 · 2015-07-20T17:09:21+00:00

"visit" is called, after the page is successfully processed by the web-crawler. The content is then contained in this object.

I suggest, that you can then use the provided methods to write down your crawled javascript content, e.g. parsing the binary content.

@Override
 public void visit(Page page) {
     //parse the binary content contained in the page object
}

An example (well it is related to images, but the way is basically the same) can be found here: https://github.com/yasserg/crawler4j/blob/master/src/test/java/edu/uci/ics/crawler4j/examples/imagecrawler/ImageCrawler.java

TechQA.

How to download text contained in JavaScript files via crawler4j?

There are 1 answers

Related Questions in JAVASCRIPT

Related Questions in WEB-CRAWLER

Related Questions in CRAWLER4J

Popular Questions

Popular Tags

Trending Questions