Which data modeling is better for this hypergraph performance-wise using Gremlin and DSE Graph?

Question

Which data modeling is better for this hypergraph performance-wise using Gremlin and DSE Graph?

278 views Asked by Michail Michailidis At 08 January 2017 at 10:17

I have this scenario where each (source) Entity has Properties that have a target pointing to another Entity. Those property mappings are grouped together. What I want to do is query those Entities that have specific properties with corresponding targets but are under the same group.

The hypergraph would like that (rectangles are the hyperedges):

The JSON would look like that:

{ 
    id: 1, label: "Entity", 
    propertyGroups: [
    { 
        propertyGroupUuid: GroupUuid1, 
        property: {id: 1, label: "Property", name: "aName1"},
        target: {id: 2, label: "Entity"}
    },
    { 
        propertyGroupUuid: GroupUuid2, 
        property: {id: 2, label: "Property", name: "aName2"},
        target: {id: 3, label: "Entity"}
    },
    { 
        propertyGroupUuid: GroupUuid2, 
        property: {id: 3, label: "Property", name: "aName3"},
        target: {id: 4, label: "Entity"}
    }]
}

The flattest version of this in the graph database could look like that:

While the most expanded version of it could look like that:

So if I want to:

get all Entities that have Property 2 and Property 3 under the same PropertyGroupUuid "targeting" Entity 3 and Entity 4 respectively I should get back Entity 1
get all Entities that have Property 1 and Property 2 under the same PropertyGroupUuid "targeting" Entity 2 and Entity 3 respectively I should NOT get back Entity 1

How is it possible to do that with gremlin against the two versions of the graph and which one is more flexible/performant using the correct indices like the ones incorporated by DSE Graph? Are there better alternatives that I haven't thought of? If the answer is detailed and well explained I will give a bounty of at least 50 :)

Thank you!

Original Q&A

There are 1 answers

**Daniel Kuppitz** · Answer 1 · 2017-01-09T20:21:26+00:00

I don't understand your first model with decoupled property nodes, but here's the traversal for model 2:

g.V().has("Property", "name", "Property 2").in("hasProperty"). /* start at any of the property 2  */
  filter(out("hasTarget").has("name", "Entity 3")).            /*   with target entity 3          */
  in("hasSubGroup").filter(                                    /* traverse to the property group  */
    out("hasSubGroup").and(                                    /* traverse to all sub-groups      */
      out("hasProperty").has("name", "Property 3"),            /* filter those that are linked to */
      out("hasTarget").has("name", Entity 4")                  /*   property 3 w/ target entity 4  */
    )
  ).in("hasGroup")                                             /* traverse to all entities that match the above criteria */

Not knowing anything about the data in your graph, it's hard to predict the performmance for this traversal. But in general, the performance should be okay if a) property names are indexed and b) the branching factor is low.

TechQA.

Which data modeling is better for this hypergraph performance-wise using Gremlin and DSE Graph?

There are 1 answers

Related Questions in GRAPH

Related Questions in DATASTAX-ENTERPRISE

Related Questions in GREMLIN

Related Questions in DATASTAX-ENTERPRISE-GRAPH

Related Questions in HYPERGRAPH

Popular Questions

Popular Tags

Trending Questions