Gremlin Big Pi operation (product of a sequence of terms) followed by id-based summation

229 views Asked by At

I have a dataset that looks something like this:

V1 = name:"some name1"
V2 = name:"some name2"
V3 = name:"some name3"
V4 = name:"some name4"

E1 = weight:0.2, StartVertex:V1, EndVertex:V2
E2 = weight:0.3, StartVertex:V1, EndVertex:V3
E3 = weight:0.4, StartVertex:V1, EndVertex:V4
E4 = weight:0.5, StartVertex:V2, EndVertex:V1
E5 = weight:0.6, StartVertex:V2, EndVertex:V3
...

I have a gremlin query that finds some paths between these vertices.

There is two things I would like to do there.

1: I would like to be able to find the product of all the weights in a path (path_edge1.weight * path_edge2.weight * ...)

2: I would like to be able to sum the resulting products for each path based on the end vertex.

Pseudo code for what I want to achieve:

g.V().has('name',REGEX,\".+some_query.+\").inE.outV.inE.FindingAPathSomehow.path{path_score = 1 : foreach edge e: path_score = path_score * e.weight}{it.lastV().id}.sumWhereIdIsEqual(it[1])

Hopefully this is somewhat understandable.

I would like to be able to do everything in a pure gremlin/groovy script since I am using RexPro.

I have looked far and wide for an answer, but have not been able to find a way to do this yet.


Further explanation if the above is unclear:

When querying I am looking for vertices with a substring equal to the "some_query". This will give me a set of start vertices.

With these vertices I am looking for a specific path in my graph that will give me several paths that might look like this:

V = Vertex
E = Edge

Path1 = V3 - E2 - V1
Path2 = V4 - E5 - V7 - E1 - V1

Each of these edges has a weight property. With this I want to get what is called "Big Pi" or "Capital Pi" which is the product of a sequence. Think summation (Σ) but with multiplication in stead of addition.

The result of Path1 would be the weight of E2, or 0.3 in the example above. While Path2 would have the weight of E5.weight * E1.weight which in the example above would be 0.6 * 0.2 = 0.12.

In this case we start at vertices V3 and V4, and both end at V1. In this case I would like to sum the weights of Path1 and Path2 because both end vertices are V1. This would give the total score of V1 as 0.3 + 0.12 = 0.42. If there had been a Path3 with end Vertex V2 and score 0.34, then the resulting list would have to elements in it; {[V1, 0.42], [V2,0.34] }.

1

There are 1 answers

3
stephen mallette On BEST ANSWER

You can do something like this:

gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.v(1).as('s').outE.inV.loop('s'){it.loops<3}{true}.path
==>[v[1], e[7][1-knows->2], v[2]]
==>[v[1], e[8][1-knows->4], v[4]]
==>[v[1], e[9][1-created->3], v[3]]
==>[v[1], e[8][1-knows->4], v[4], e[10][4-created->5], v[5]]
==>[v[1], e[8][1-knows->4], v[4], e[11][4-created->3], v[3]]

The above uses the toy graph to get some paths that produces multiple results with the same endpoint. Since you to multiply the edge weights for each path and then sum them for each vertex ending a path, it would seem that a good return value for all this would be Map keyed on the end vertex with a value being the list of lists of weights for each path. To do that, I used a groupBy:

gremlin> m=g.v(1).as('s').outE.inV.loop('s'){it.loops<3}{true}.path.groupBy{it[it.size()-1]}{it.findAll{it instanceof Edge}.collect{it.getProperty("weight")}}.cap.next()
==>v[3]=[[0.4], [1.0, 0.4]]
==>v[2]=[[0.5]]
==>v[5]=[[1.0, 1.0]]
==>v[4]=[[1.0]]

The first closure to the groupBy provides the key (i.e. the last vertex in the path). The second closure filters the Edge objects and pulls off the weight to store in the list of paths for each key. From here you can operate with the m or Map to finish the calculation. At this point we're basically just doing straight Groovy. The following shows the calculation of the product of the weights:

gremlin> m.collect{k,v->[(k):v.collect{p->p.inject{product,weight->product*weight}}]}           
==>{v[3]=[0.4, 0.4000000059604645]}
==>{v[2]=[0.5]}
==>{v[5]=[1.0]}
==>{v[4]=[1.0]}

Once you have that much, calculating the sum per end vertex is just done with the groovy sum function:

gremlin> m.collect{k,v->[(k):v.collect{p->p.inject{product,weight->product*weight}}.sum()]}
==>{v[3]=0.800000011920929}
==>{v[2]=0.5}
==>{v[5]=1.0}
==>{v[4]=1.0}

Note that I'm breaking this up into multiple Gremlin statements for ease of explanation and readability, but if you like the single line style you could go that way too. The best way to get it back to single line would be to add a third closure to the groupBy which will act as a reduce step to calculate the weight product/sum.