setting up and running apache nutch 2.2.1

Question

setting up and running apache nutch 2.2.1

694 views Asked by hao At 09 December 2014 at 08:27

I am trying to set up and run apache nutch 2.2.1 on my ubuntu desktop. As a newbie, I found some parts of the tutorial given by the official website a bit confusing.

If I were to run it on my own desktop, is it correct to go to the
```
$NUTCH_HOME/runtime/local 
```

to run the bin/nutch command?

Where should I put the file named urls? (in which there a seed list seed.txt) Is it under
```
$NUTCH_HOME/runtime/local
```

If I am in the right directory, I had this problem executing the command

bin/nutch crawl urls -dir crawl -depth 1

InjectorJob: Using class org.apache.gora.memory.store.MemStore as the Gora storage class. InjectorJob: total number of urls rejected by filters: 0 InjectorJob: total number of urls injected after normalization and filtering: 0 Exception in thread "main" java.lang.RuntimeException: job failed: name=generate: null, jobid=job_local1613558008_0002 at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54) at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:199) at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68) at org.apache.nutch.crawl.Crawler.run(Crawler.java:152) at org.apache.nutch.crawl.Crawler.run(Crawler.java:250) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)

I am following the tutorial 1 http://wiki.apache.org/nutch/NutchTutorial until 3.3 and have yet to configure GORA Hbase etc. It seems that this problem arises because the injector did not get the urls. Does anyone know how to solve this problem? Thanks a lot!

Original Q&A

There are 2 answers

user7184731 On 20 November 2016 at 08:21

in case you want integrate with GORA and Hbase mention this in Nutchsite.xml

 <property>
        <name>storage.data.store.class</name>
        <value>org.apache.gora.hbase.store.HBaseStore</value>
        <description>Default class for storing data</description>
    </property>

**Do Do** · Accepted Answer · 2014-12-11T16:35:58+00:00

Do Do On 11 December 2014 at 16:35 BEST ANSWER

you should go to $NUTCH_HOME/runtime/deploy to run the command

TechQA.

setting up and running apache nutch 2.2.1

There are 2 answers

Related Questions in APACHE

Related Questions in HBASE

Related Questions in NUTCH

Related Questions in GORA

Popular Questions

Popular Tags

Trending Questions