My setup:
I'm using OrientDB with a large graph of People vertices. I'm using the gremlin java driver to access this database since I would like to eventually switch to a different graph database down the line.
My database:
Each person has certain preference vertices (connected via a labeled edge describing that relation to that preference). All preferences are then connected to core concept vertex.
Problem I'm trying to solve:
I'm trying to find a way (kudos if its as simple as a Gremlin query) to start at a Person vertex and traverse down to all people with identical preferences via a core concept.
Here is a made up example of a matching case. Person B will be returned in a list of perfect matches of people when starting at Person A. I forgot to draw the directions to those edges on this picture :/ take a look at the non matching case to see the directions.
Here is an example of a non matching case. Person B will not be returned in a list of perfect matches of People. Why? Because all outgoing edges on Person B do not resolve to identically matching edges on Person A; in this case, Person A refuses to eat apples, but Person B doesn't list a similar preference to anything they refuse to eat.
Another non matching case from the above example: If Person A refuses to eat apples and Person B refuses to eat bananas -- they will not match.
If Person B likes Fries the most and likes Cheeseburgers the least, that would be a non-matching case as well.
My initial idea (that I'm not sure how to implement) with a query
- I would start at person A
- Find all outgoing edges to preference vertices and store some sort of "marker" or map to that preference vertex with the edge label.
- Traverse out of those vertices down all SimilarTo labeled edges. Copy those markers from preference vertex into the concept vertex.
- Reverse the line: concept vertex -> preference vertex (copy makers from concept to preference vertex)
- ... then somehow match ALL edges to those markers...
- exclude person a from the results
Any ideas?
Let's start with the creation of your sample graph:
The query, you're looking for, is the following (I will explain each step later):
Now, when we add the "refuses to eat Apples" relation:
...Person B is no longer a match:
Let's go through the query step by step / line by line:
This should be pretty clear: start at Person A.
Traverse the out edges and set a marker, so that we can reference the edges later. Then traverse to what I called
category
vertices.For every
category
vertex add a1
to an internal collection. You could also store the vertex itself, but this would be a waste of memory, since we won't need any information from the vertices.Traverse the other direction along the
similar
edges to thefood
and then along those edges that have the same label as the marked edge from the beginning. In the end ignore the person where the traversal started (Person A).Count the number of traversers that made it to each person vertex.
Count the number of
Category
vertices (the1
s).Unfold the person counters, so the keys will be the person vertices and the values will be the number of traversers that made it to this vertex.
Ultimately the number of crossed
category
vertices must match the number of traversers on aperson
vertex. If that's the case, we have a match.Note, that it's necessary to have a
similar
edge incident to theApples
vertex.