I have a dataset that looks something like this:
V1 = name:"some name1"
V2 = name:"some name2"
V3 = name:"some name3"
V4 = name:"some name4"
E1 = weight:0.2, StartVertex:V1, EndVertex:V2
E2 = weight:0.3, StartVertex:V1, EndVertex:V3
E3 = weight:0.4, StartVertex:V1, EndVertex:V4
E4 = weight:0.5, StartVertex:V2, EndVertex:V1
E5 = weight:0.6, StartVertex:V2, EndVertex:V3
...
I have a gremlin query that finds some paths between these vertices.
There is two things I would like to do there.
1: I would like to be able to find the product of all the weights in a path (path_edge1.weight * path_edge2.weight * ...)
2: I would like to be able to sum the resulting products for each path based on the end vertex.
Pseudo code for what I want to achieve:
g.V().has('name',REGEX,\".+some_query.+\").inE.outV.inE.FindingAPathSomehow.path{path_score = 1 : foreach edge e: path_score = path_score * e.weight}{it.lastV().id}.sumWhereIdIsEqual(it[1])
Hopefully this is somewhat understandable.
I would like to be able to do everything in a pure gremlin/groovy script since I am using RexPro.
I have looked far and wide for an answer, but have not been able to find a way to do this yet.
Further explanation if the above is unclear:
When querying I am looking for vertices with a substring equal to the "some_query". This will give me a set of start vertices.
With these vertices I am looking for a specific path in my graph that will give me several paths that might look like this:
V = Vertex
E = Edge
Path1 = V3 - E2 - V1
Path2 = V4 - E5 - V7 - E1 - V1
Each of these edges has a weight property. With this I want to get what is called "Big Pi" or "Capital Pi" which is the product of a sequence. Think summation (Σ) but with multiplication in stead of addition.
The result of Path1
would be the weight of E2, or 0.3
in the example above. While Path2
would have the weight of E5.weight * E1.weight
which in the example above would be 0.6 * 0.2 = 0.12
.
In this case we start at vertices V3
and V4
, and both end at V1
. In this case I would like to sum the weights of Path1
and Path2
because both end vertices are V1
. This would give the total score of V1
as 0.3 + 0.12 = 0.42
. If there had been a Path3
with end Vertex V2
and score 0.34
, then the resulting list would have to elements in it; {[V1, 0.42], [V2,0.34] }
.
You can do something like this:
The above uses the toy graph to get some paths that produces multiple results with the same endpoint. Since you to multiply the edge weights for each path and then sum them for each vertex ending a path, it would seem that a good return value for all this would be
Map
keyed on the end vertex with a value being the list of lists of weights for each path. To do that, I used agroupBy
:The first closure to the
groupBy
provides the key (i.e. the last vertex in the path). The second closure filters theEdge
objects and pulls off the weight to store in the list of paths for each key. From here you can operate with them
orMap
to finish the calculation. At this point we're basically just doing straight Groovy. The following shows the calculation of the product of the weights:Once you have that much, calculating the sum per end vertex is just done with the groovy
sum
function:Note that I'm breaking this up into multiple Gremlin statements for ease of explanation and readability, but if you like the single line style you could go that way too. The best way to get it back to single line would be to add a third closure to the
groupBy
which will act as a reduce step to calculate the weight product/sum.