I'm trying to use crawler4j
to extract text from some websites. However, while I have changed the Filters to allow extensions with js in the following manner
private final static Pattern FILTERS = Pattern.compile(".*(\\.(css|gif|jpg"
+ "|png|mp3|mp3|zip|gz))$");
I do not know how to store this text to a file (if there is a different method of doing so for text in js files as opposed to regular text)
"visit" is called, after the page is successfully processed by the web-crawler. The content is then contained in this object.
I suggest, that you can then use the provided methods to write down your crawled javascript content, e.g. parsing the binary content.
An example (well it is related to images, but the way is basically the same) can be found here: https://github.com/yasserg/crawler4j/blob/master/src/test/java/edu/uci/ics/crawler4j/examples/imagecrawler/ImageCrawler.java