Which data modeling is better for this hypergraph performance-wise using Gremlin and DSE Graph?

284 views Asked by At

I have this scenario where each (source) Entity has Properties that have a target pointing to another Entity. Those property mappings are grouped together. What I want to do is query those Entities that have specific properties with corresponding targets but are under the same group.

The hypergraph would like that (rectangles are the hyperedges):

hypergraph version

The JSON would look like that:

{ 
    id: 1, label: "Entity", 
    propertyGroups: [
    { 
        propertyGroupUuid: GroupUuid1, 
        property: {id: 1, label: "Property", name: "aName1"},
        target: {id: 2, label: "Entity"}
    },
    { 
        propertyGroupUuid: GroupUuid2, 
        property: {id: 2, label: "Property", name: "aName2"},
        target: {id: 3, label: "Entity"}
    },
    { 
        propertyGroupUuid: GroupUuid2, 
        property: {id: 3, label: "Property", name: "aName3"},
        target: {id: 4, label: "Entity"}
    }]
}

The flattest version of this in the graph database could look like that:

flattest version

While the most expanded version of it could look like that:

most expanded version

So if I want to:

  • get all Entities that have Property 2 and Property 3 under the same PropertyGroupUuid "targeting" Entity 3 and Entity 4 respectively I should get back Entity 1
  • get all Entities that have Property 1 and Property 2 under the same PropertyGroupUuid "targeting" Entity 2 and Entity 3 respectively I should NOT get back Entity 1

How is it possible to do that with gremlin against the two versions of the graph and which one is more flexible/performant using the correct indices like the ones incorporated by DSE Graph? Are there better alternatives that I haven't thought of? If the answer is detailed and well explained I will give a bounty of at least 50 :)

Thank you!

1

There are 1 answers

4
Daniel Kuppitz On

I don't understand your first model with decoupled property nodes, but here's the traversal for model 2:

g.V().has("Property", "name", "Property 2").in("hasProperty"). /* start at any of the property 2  */
  filter(out("hasTarget").has("name", "Entity 3")).            /*   with target entity 3          */
  in("hasSubGroup").filter(                                    /* traverse to the property group  */
    out("hasSubGroup").and(                                    /* traverse to all sub-groups      */
      out("hasProperty").has("name", "Property 3"),            /* filter those that are linked to */
      out("hasTarget").has("name", Entity 4")                  /*   property 3 w/ target entity 4  */
    )
  ).in("hasGroup")                                             /* traverse to all entities that match the above criteria */

Not knowing anything about the data in your graph, it's hard to predict the performmance for this traversal. But in general, the performance should be okay if a) property names are indexed and b) the branching factor is low.