does neo4j have a minus operator

Asked by At

I am trying to compare two graphs using cypher.

I'm fairly new to cypher, so am not sure if this is the right approach or not, but if I were to do this using SQL, I would use a NOT IN or MINUS query. I have also tried to read the graph algorithms plugin for neo4j - I suspect that one or more of these may be helpful, but I really have no idea where to start with them.

I note that cypher has UNION and UNION ALL operators, but not a MINUS operator, so how would I do this in cypher? Again if I were using SQL, I could achieve the desired result easily with MINUS.

Firstly, here is a diagram of my graph:

Diagram of my sample data

Basically there are People and Parts. People make a Part. For example "Bob" makes "Bob's part".

There is a dependency between the parts. For example, the "Final Product" is made from "Bob's Part", "Charles' Part" and "Arthur's Part".

Finally, there is a dependency between people. Specifically since Peter, who makes the final product, needs parts from Bob, Arthur and Charles, there should be dependencies from Peter on Bob, Arthur and Charles.

However, the relationship between Charles and Peter (shown in Red) is missing from the sample data. This is the relationship that I am trying to identify.

The algorithm I am using is:

  1. Query 1: using the "MADE_FROM" relationship to identify which parts are used to make another part. This is the graph with the green vertices generated from the Green relationships.

  2. Query 2: identify which parts are used to make another part by following the people relationships and who makes what. This is (or should be) the graph consisting of the green vertices, but is generated by following the MAKES and DEPENDS_ON relationships.

Sadly, due to a mess up the red DEPENDS_ON relationship is missing, so the results of the two queries above do not match.

Following is my sample data note that the record Peter,Charles,depends is missing:

id1,id2,relationship
Bob,Bobs Part,makes
Arthur,Arthurs Part,makes
Charles,Charles Part,makes
Peter,Final Product,makes
Peter,Bob,depends
Peter,Arthur,depends
Final Product,Arthurs Part,consists
Final Product,Bobs Part,consists
Final Product,Charles Part,consists

Here is the code that I have so far, it loads the graph from the above file and shows the two queries that I would like to use with the MINUS operator.

match(p:Person) detach delete p;
match(p:Part) detach delete p;

// Load the parts, people and who makes what relationship (Black relationship).
load csv with headers from 'file:///gmc/relationships.csv' as rec
with rec
where rec.relationship = "makes"
  create (person:Person {name: rec.id1})
  create (part:Part {partName: rec.id2})
  create (person) - [:MAKES] -> (part)
;

// Load the part relationships (green relationships)
load csv with headers from 'file:///gmc/relationships.csv' as rec
with rec
where rec.relationship = "consists"
  match (part:Part {partName: rec.id1})
  match (madeFrom:Part {partName: rec.id2})
  create (part) - [:MADE_FROM] -> (madeFrom)
;

// Load the people dependencies (blue relationships).
load csv with headers from 'file:///gmc/relationships.csv' as rec
with rec
where rec.relationship = "depends"
  match (person:Person {name: rec.id1})
  match (dependsOn:Person {name: rec.id2})
  create (person) - [:DEPENDS_ON] -> (dependsOn)
;

And finally the queries that I am working with to produce the "Report" that I need:

neo4j> // Query1: Produce a list of parts and the parts that they are made from.
neo4j> // i.e. Final Product is made from Arthur's, Bob's and Charles' parts.
neo4j> match(part:Part)-[:MADE_FROM] -> (madeFrom:Part)
       return part, madeFrom
       order by part.partName, madeFrom.partName;
+--------------------------------------------------------------------------+
| part                                | madeFrom                           |
+--------------------------------------------------------------------------+
| (:Part {partName: "Final Product"}) | (:Part {partName: "Arthurs Part"}) |
| (:Part {partName: "Final Product"}) | (:Part {partName: "Bobs Part"})    |
| (:Part {partName: "Final Product"}) | (:Part {partName: "Charles Part"}) |
+--------------------------------------------------------------------------+

3 rows available after 1 ms, consumed after another 0 ms
neo4j> // Query 2: Produce a list of parts and the parts that they are made from
neo4j> // using the Dependencies that the people have on one another.
neo4j> match (part:Part) <- [:MAKES] - (person:Person)-[:DEPENDS_ON] -> (dependsOn:Person)-[:MAKES] -> (madeFrom:Part)
       return part, madeFrom
       order by part.partName, madeFrom.partName;
+--------------------------------------------------------------------------+
| part                                | madeFrom                           |
+--------------------------------------------------------------------------+
| (:Part {partName: "Final Product"}) | (:Part {partName: "Arthurs Part"}) |
| (:Part {partName: "Final Product"}) | (:Part {partName: "Bobs Part"})    |
+--------------------------------------------------------------------------+

2 rows available after 1 ms, consumed after another 0 ms
neo4j> // I need:  Query1 MINUS Query2   - which should produce
+--------------------------------------------------------------------------+
| part                                | madeFrom                           |
+--------------------------------------------------------------------------+
| (:Part {partName: "Final Product"}) | (:Part {partName: "Charles Part"}) |
+--------------------------------------------------------------------------+
neo4j> 

The final answer set is what I am looking for. This is showing me that the "red relationship" between Peter and Charles is missing because:

  • the part that Peter makes (Final Product) depends upon the part that Charles makes (Charles' Part). However,
  • their is no dependency from Peter to Charles on the "DEPENDS_ON" path.

So how can I do this using cypher? Or am I totally barking up the wrong tree with this approach????

1 Answers

2
Marj On Best Solutions

There's a bit of a mind step in switching from SQL to Cypher. Try to think less about the 'tables' and more about the 'relationships'. It takes a bit of doing, but when it clicks, everything starts to become a bit more obvious.

This query will give you want you want to know:

match (m:Person)-[:MAKES]->(x:Part)-[:MADE_FROM]->(y:Part)<-[:MAKES]-(n:Person)
where not (m)-[:DEPENDS_ON]-(n)  
return m,x,y,n

Basically, it looks for a person m who makes a part x which is made from part y which, in turn, is made by person n where m is not connected to n. The way the query is written ensures m and n are different people and x and y are different parts.

These types of reflexive 'join' are a nightmare in SQL, but relatively easy in Cypher.