How to list the nodes associated to a given vertex through intermediate nodes in TinkerPop3?

401 views Asked by At

I have a graph in TinkerPop3 in which the nodes are connected through edges with a given weight.

Given a node A in the graph I'm trying to traverse it to obtain all nodes which can be reached from A with less than N steps, and to sort them by a score. This score is calculated by mapping each edge in the path to that edge's weight divided by the step number and the total step, and finally summing the list.

Here is an example:

gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> a = g.addV('name','a').next()
==>v[0]
gremlin> b = g.addV('name','b').next()
==>v[2]
gremlin> c = g.addV('name','c').next()
==>v[4]
gremlin> d = g.addV('name','d').next()
==>v[6]
gremlin> e = g.addV('name','e').next()
==>v[8]
gremlin> a.addEdge('conn', b, 'weight', 1)
==>e[10][0-conn->2]
gremlin> a.addEdge('conn', c, 'weight', 3)
==>e[11][0-conn->4]
gremlin> b.addEdge('conn', d, 'weight', 3)
==>e[12][2-conn->6]
gremlin> d.addEdge('conn', e, 'weight', 9)
==>e[14][6-conn->8]

The expected scores for nodes related to "a" would be:

score_n = sum_over_n(weight_n-1n / (depth_n * total_N_from_a_to_n))
score_b = w_ab / (1 * 1) = 1 / 1 = 1
score_c = w_ac / (1 * 1) = 3 / 1 = 3
score_d = (w_ab / (1 * 2)) + (w_bd / (2 * 2)) = 1/2 + 3/4 = 5/4
score_e = (w_ab / (1 * 3)) + (w_bd / (2 * 3)) + (w_de / (3 * 3)) = 1/3 + 3/6 + 9/9 = 11/6

And thus, the result should be the corresponding nodes ordered decreasingly:

c e d b

I'm new to TinkerPop and Gremlin and I've been trying a combination of the repeat and sack steps without luck. The plan until now has been to accumulate the step in a sack for each traversal, to emit the score and node, and finally sort by score, but I'm worried about potential performance issues associated to the process.

1

There are 1 answers

6
Daniel Kuppitz On BEST ANSWER

Not the easiest traversal to start with TinkerPop. First thing I've changed is the data type for weights.

graph = TinkerGraph.open()
g = graph.traversal()
a = g.addV('name','a').next()
b = g.addV('name','b').next()
c = g.addV('name','c').next()
d = g.addV('name','d').next()
e = g.addV('name','e').next()
a.addEdge('conn', b, 'weight', 1f)
a.addEdge('conn', c, 'weight', 3f)
b.addEdge('conn', d, 'weight', 3f)
d.addEdge('conn', e, 'weight', 9f)

This was necessary to get rid of rounding errors. It could also be done as part of the traversal, but I didn't want to add even more steps as it already became complex enough:

gremlin> g.withSack(0).V().has("name","a").
           repeat(outE("conn").as("e").values("weight").as("w").
                  sack(assign).by(loops()).sack(sum).by(constant(1)).sack().as("l").
                  select("e").inV().as("v").select("v","w","l").as("x").select("v")).emit().
           select(all, "x").as("x").
           count(local).as("c").select("x").
           map(unfold().sack(assign).by(select("w")).
                        sack(div).by(select("l")).
                        sack(div).by(select("c")).sack().fold()).
           project("score","v").by(sum(local)).by(select("x").tail(local, 1).select("v")).
           order().by(select("score"), decr).select("v").values("name")
==>c
==>e
==>d
==>b

UPDATE

What follows is a working traversal for TP 3.0.1:

gremlin> Gremlin.version()
==>3.0.1-incubating
gremlin> g.withSack(0D).V().has("name","a").
           repeat(outE("conn").as("e").values("weight").as("w").
                  select(all,"w").count(local).as("l").
                  select(last,"e").inV().as("v").select(last,"v","w","l").as("x").select("v")).emit().
           select(all,"x").as("x").
           count(local).as("c").select(last,"x").
           map(unfold().as("m").select("w").sack(sum).
               select(last,"m").select("l").sack(div).
               select(last,"m").select("c").sack(div).sack().fold()).as("score").
           select(last, "score","x").by(sum(local)).by(tail(local, 1).select("v")).
           order().by(select("score"), decr).select("x").values("name")
==>c
==>e
==>d
==>b

The only good thing about this traversal is, that the weight properties don't have to have another data type - int works fine.