Issue while ingesting a Titan graph into Faunus

501 views Asked by At

I have installed both Titan and Faunus and each seems to be working properly (titan-0.4.4 & faunus-0.4.4)

However, after ingesting a sizable graph in Titan and trying to import it in Faunus via

FaunusFactory.open(    )

I am experiencing issues. To be more precise, I do seem to get a faunus graph from the call FaunusFactory.open( ),

faunusgraph[titanhbaseinputformat->titanhbaseoutputformat]

but then, even asking a simple

g.v(10)

I do get this error:

Task Id : attempt_201407181049_0009_m_000000_0, Status : FAILED
com.thinkaurelius.titan.core.TitanException: Exception in Titan
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.getAdminInterface(HBaseStoreManager.java:380)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.ensureColumnFamilyExists(HBaseStoreManager.java:275)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.openDatabase(HBaseStoreManager.java:228)

My property file is taken straight out of the Faunus page with Titan-HBase input, except of course changing the url of the hadoop cluster:

faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname= my IP
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
faunus.graph.output.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseOutputFormat
faunus.graph.output.titan.storage.backend=hbase
faunus.graph.output.titan.storage.hostname= IP of my host
faunus.graph.output.titan.storage.port=2181
faunus.graph.output.titan.storage.tablename=titan
faunus.graph.output.titan.storage.batch-loading=true
faunus.output.location=output1
zookeeper.znode.parent=/hbase-unsecure
titan.graph.output.ids.block-size=100000

Anyone can help?

ADDENDUM:

To address the comment below, here is some context: as I have mentioned, I have a graph in Titan and can perform basic gremlin queries on it.

However, I do need to run a gremlin global query which, due to the size of the graph, needs Faunus and its underlying MR capabilities. Hence the need to import it. The error I get doesn't look to me as if it points to some inconsistency in the graph itself.

1

There are 1 answers

6
stephen mallette On

I'm not sure that you have your "flow" of Faunus right. If your end result is to do a global query of the graph, then consider this approach:

  1. pull your graph to sequence file
  2. issue your global query over the sequence file

More specifically create hbase-seq.properties:

# input graph parameters
faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname=localhost
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
# hbase.mapreduce.scan.cachedrows=1000

# output data (graph or statistic) parameters
faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=snapshot
faunus.output.location.overwrite=true

In Faunus, copy do:

g = FaunusFactory.open('hbase-seq.properties')
g._()

That will read the graph from hbase and write it to sequence file in HDFS. Next, create: seq-noop.properties with these contents:

# input graph parameters
faunus.graph.input.format=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
faunus.input.location=snapshot/job-0

# output data parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=analysis
faunus.output.location.overwrite=true

The above configuration will read your sequence file from the previous step and without re-writing the graph (that's what NoOpOutputFormat is for). Now in Faunus do:

g = FaunusFactory.open('seq-noop.properties')
g.V.sideEffect('{it.degree=it.bothE.count()}').degree.groupCount()

This will execute a degree distribution, writing the results in HDFS to the 'analysis' directory. Obviously you can do whatever Faunus-flavored Gremlin you want here - I just wanted to provide an example. I think this is a pretty standard "flow" or pattern for using Faunus from a graph analysis perspective.