Nutch + HBase: hbase versions issue and java exception

814 views Asked by At

I'm trying to setup Nutch 2.2.1 using HBase 0.94.14, on Debian Squeeze. I've followed Nutch 1 and 2 tutorials carefully and various documentations. I could build HBase 0.94.14, and eventually got it to work (I can create tables etc.) I could build Nutch without any issue (it's set on Gora 0.3)

Now issues are: 1- when trying to launch Nutch, I get the following trace:

./nutch inject /root/nutch/apache-nutch-2.2.1/urls/
InjectorJob: starting at 2014-11-27 09:43:53
InjectorJob: Injecting urlDir: /root/nutch/apache-nutch-2.2.1/urls
InjectorJob: java.lang.ClassNotFoundException: org.apache.gora.memory.store.HBaseStore
    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

etc.

Using strace -f, I've figured out that "HBaseStore.class" was not found:

stat("/root/nutch/apache-nutch-2.2.1/runtime/local/org/apache/gora/memory/store/HBaseStore.class",\
  <unfinished ...>
[pid  1827] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)

I tried to figure out if there was an issue with classpath, but eventually found out that: - HBaseStore.class was present neither in the Nutch directory tree nor in the Hbase 0.94.4 directory tree - HBase jar version in the Nutch tree was surprinsingly: hbase-0.90.4.jar

According to some online discussions I found, I replace hbase-0.90.4.jar in the nutch tree with hbase-0.94.4 from the hbase tree...

But: - it doesn't fix the java issue - each time I'm rebuilding nutch, hbase-0.90.4.jar is back and I can't find any source for it in the nutch tree :-/

Note that /root/nutch/apache-nutch-2.2.1/conf/hbase-site.xml has:

<property>
<name>hbase.rootdir</name>
<value>/root/nutch/hbase-master/conf/</value>
</property>

which corresponds to Nutch 0.94.4 ...

Also tried to rebuild and use Gora 0.5 but it makes Nutch build fail.

I'm not an expert in Java at all, and I don't understand why Nutch is not using the correct version of HBase, why it seems there are missing sources and java classes, and at this point I'm totally stuck. What a mess.

Thanks for any tip that could help to save this situation.

3

There are 3 answers

0
Gwen Wing On

Alfonso,

I checked about gora.properties, it was OK.

Also, I've tried the latest 2.3 Snapshot but unfortunately it ended into some dependency issue at build time:

[ivy:resolve]       ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve]       ::          UNRESOLVED DEPENDENCIES         ::
[ivy:resolve]       ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve]       ::   org.restlet.jse#org.restlet.lib.org.restlet.lib.org.json;2.0:     java.text.ParseException: inconsistent module descriptor file found in 'http://maven.restlet.org/org/restlet/jse/org.restlet.lib.org.restlet.lib.org.json/2.0/org.restlet.lib.org.restlet.lib.org.json-2.0.pom': bad module name: expected='org.restlet.lib.org.restlet.lib.org.json' found='org.restlet.lib.org.json'; 
[ivy:resolve]       ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] :::: ERRORS
[ivy:resolve]       restlet: bad module name found in http://maven.restlet.org/org/restlet/jse/  org.restlet.lib.org.restlet.lib.org.json/2.0/org.restlet.lib.org.restlet.lib.org.json-2.0.pom: expected='org.restlet.lib.org.restlet.lib.org.json found='org.restlet.lib.org.json'
[ivy:resolve] 
[ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS

BUILD FAILED
/root/nutch/2.3/build.xml:467: impossible to resolve dependencies:
        resolve failed - see output for details
7
Alfonso Nishikawa On

Are you sure you have this line in gora.properties:

gora.datastore.default=org.apache.gora.hbase.store.HBaseStore

with special attention to the namespace:

org.apache.gora.hbase.store.HBaseStore

and not

org.apache.gora.memory.store.HBaseStore

I hope this will fix the issue :)


Edit about versions:

About hbase-0.90.4 returning, Gora-0.3 depends on HBase-0.90.4, which is incompatible with HBase-0.94.14.

In order to run with HBase-0.94.14 you have to use Nutch-2.3-SNAPSHOT (called "2.x"). You have a link in Nutch2Tutorial or you can svn checkout http://svn.apache.org/repos/asf/nutch/branches/2.x/

Nutch 2.3-SNAPSHOT depends on Gora-0.5 which depends on HBase 0.94.14


Seems quite solved:

http://mail-archives.apache.org/mod_mbox/nutch-dev/201412.mbox/%[email protected]%3E

https://issues.apache.org/jira/browse/NUTCH-1899

0
user1337 On

Try updating version number:

  • go in the ivy/ivy.xml;
  • change the rev="2.2.1" of org="org.restlet.jse" to rev="2.2.3" (occurs 3 times).