Loading data into Titan with bulbs and then accessing it

321 views Asked by At

I am a complete novice in graph databases and all the Titan ecosystem, so please excuse me sounding stupid. I am also suffering from the lack of documentation -_-

I've installed the titan server. I am using Cassandra as a back-end.

I am trying to load basic twitter data into Titan using Python. I use the bulbs library for this purpose. Lets say, i have a list of people i follow on twitter in the friends list

my python script goes like this

from bulbs.titan import Graph
# some other imports here

# getting the *friends* list for a specified user here


g = Graph()

# a vertex of a specified user
center = g.vertices.create(name = 'sergiikhomenko')


for friend in friends:
    cur_friend = g.vertices.create(name = friend)
    g.edges.create(center,'follows',cur_friend)

From what i understand - the above code should have created a graph in Titan with a number of vertices, some of which a connected by the follows edge.

My questions are:

How do I save it in Titan?? (like a commit in SQL)

How do I access it later?? Should I be able to access it through gremlin shell?? If yes, how??

My next question would be about visualizing the data, but i am very far from there :)

Please help :) I am completely lost in all this Titan, Gremlin, Rexster,etc. :)

Update: One of the requirement of our POC project - is ... python :), that's why i jumped into bulbs straight on. I'll definitely follow the advice below though :)

1

There are 1 answers

7
stephen mallette On BEST ANSWER

My answer will be somewhat incomplete because I can't really supply answers around Bulbs but you do ask some specific questions which I can try to answer:

How do I save it in Titan?? (like a commit in SQL)

It's just g.commit() in Java/Groovy.

How do I access it later?? Should I be able to access it through gremlin shell?? If yes, how??

Once it's committed to cassandra, access it with Bulbs, the gremlin shell, some other application, whatever. Not sure what you're asking really, but I like the Gremlin Console for such things so if have cassandra started locally, start up bin/gremlin.sh and do:

g = TitanFactory.build()
    .set("storage.backend","cassandra")
    .set("storage.hostname","127.0.0.1")
    .open();

That will get you a connection to cassandra and you should be able to query your data.

I am completely lost in all this Titan, Gremlin, Rexster,etc

My advice to all new users (especially those new to graphs, cassandra, the jvm, etc.) is to slow down. The fastest way to get discouraged is to try to do python to the bulbs to the rexster to the gremlin over the titan to the cassandra cluster hosted in ec2 with hadoop - and try to load a billion edge graph into that.

If you are new, then start with the latest stuff: TinkerPop3 - http://tinkerpop.incubator.apache.org/ - which bulbs does not yet support - but that's ok because you're learning TinkerPop which is important to learning the whole stack and all of TinkerPop's implementations (e.g. Titan). Use TinkerGraph (not Titan) with a small subset of your data and make sure you get the pattern for loading that small subset right before you try to go full scale. Use the Gremlin Console for everything related to this initial goal. That is a recipe for an easy win. Under that approach you could likely have a Graph going with some queries over your own data in a day and learn a good portion of what you need to do it with Titan.

Once you have your Graph, get it working in Gremlin Server (the Rexster replacement for TP3). Then think about how you might access that via python tooling. Or maybe you figure out how to convert TinkerGraph to Titan (perhaps start with BerkeleyDB rather than cassandra). My point here is to more slowly increment your involvement with different pieces of the ecosystem because it is otherwise overwhelming.