Nutch 2.x response content : doesn't work properly without JavaScript enabled. Please enable it to continue

28 views Asked by At

After I Crawl the URL which I used nutch2.x,I solrindex the parsing data into solr ,but I get the beow json data,I hope get the content from the below url,how to set my seed url text and regex-urlfilter.txt? ---------------response the incorrect data-------------------------------- "response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[ { "tstamp":"2023-04-22T09:54:19.129Z", "digest":"def97ee1241655c3980bba6bdde9d3ea", "boost":1.0177004, "id":"http://www.iwencai.com/unifiedwap/result?tid=stockpick&qs=box_main_ths&w=A%E8%82%A1%E4%B8%BB%E6%9D%BF%3B%28%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma10-%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma5%29%3E%28%E5%BD%93%E5%89%8Dma10-%E5%BD%93%E5%89%8Dma5%29%20%3B%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5KDJ%E5%8D%B3%E5%B0%86%E9%87%91%E5%8F%89%E6%88%96%E9%87%91%E5%8F%89%3B%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5%E6%9C%80%E9%AB%98%E4%BB%B7%3C%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma5%3B%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma5%3C%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma20%20%3B%E5%BD%93%E5%89%8Dma10%3Ema5%3B%E7%8E%B0%E4%BB%B7%3E%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5%E6%94%B6%E7%9B%98%E4%BB%B7", "title":"同花顺问财", "url":"http://www.iwencai.com/unifiedwap/result?tid=stockpick&qs=box_main_ths&w=A%E8%82%A1%E4%B8%BB%E6%9D%BF%3B%28%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma10-%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma5%29%3E%28%E5%BD%93%E5%89%8Dma10-%E5%BD%93%E5%89%8Dma5%29%20%3B%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5KDJ%E5%8D%B3%E5%B0%86%E9%87%91%E5%8F%89%E6%88%96%E9%87%91%E5%8F%89%3B%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5%E6%9C%80%E9%AB%98%E4%BB%B7%3C%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma5%3B%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma5%3C%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma20%20%3B%E5%BD%93%E5%89%8Dma10%3Ema5%3B%E7%8E%B0%E4%BB%B7%3E%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5%E6%94%B6%E7%9B%98%E4%BB%B7", "content":"同花顺问财\nWe're sorry but 同花顺问财选股 doesn't work properly without JavaScript enabled. Please enable it to continue..\n", "version":1763869842097569792}] }}

1

There are 1 answers

1
Sebastian Nagel On

You can use the Selenium-based protocol plugins, In order to make Nutch crawler properly sites which do not function without JavaScript enabled. See the Readme of protocol-selenium.