Cypher - How is negation handled?

908 views Asked by At

I have been trying to understand how Cypher corresponds to graph database theory. In particular I had in mind "Query Languages for Graph Databases by Peter T.Wood" (http://users.dcc.uchile.cl/~pbarcelo/wood.pdf). I imagine that it corresponds to conjunctive regular path queries with additional operations such as aggregation but I cannot find information about this anywhere.

Two question:

  1. Is there anywhere I can find information about the theory behind Cypher? In particular with a conjunctive query spin.
  2. How does Cypher handle negation? e.g. is it 'safe negation'?

Background: I started using Neo4j with Cypher for a paper that I'm working on. I did this because these both seemed well established and well supported. However, I want to abstract away from my implementation of query answering to a more general graph query formalism but I do not know how Cypher corresponds to this.

1

There are 1 answers

0
Michael Anslow On BEST ANSWER

I got an answer to this question on LinkedIn which can be found here: https://www.linkedin.com/groups/Does-anybody-know-similarities-differences-2623939.S.5939804856381382658

The answer below refers to this query:

MATCH (n:Asteroid) WHERE NOT n.name = 'Ceres' RETURN n LIMIT 25

The answer is as follows:

There are two answers to your question regarding negation:

(i) The negation of a property value

This is the case covered in the comment above, which is essentially correct; I will re-use the provided example. Internally, all nodes having the label 'Asteroid' are retrieved using the label index. This is followed by a “selection” operator (from relational algebra), which is used to select only those tuples containing either no 'name' property or where the 'name' property is not 'Ceres'.This is then followed by an operator called “top”, limiting the results returned to 25. As you can see, this does not differ from how SQL is executed on relational databases.

(ii) The negation of a pattern predicate

This is where the full power of graph pattern matching is utilised.

Say we have the following toy query, in which I want to find all events I have not attended:

MATCH (me:Person {name: “me”}), (e:Event) WHERE NOT ( (me) - [:ATTENDED] -> (e) ) RETURN e

Our query execution plan is a tree of operators, each of which has up to 2 children.

At root of the tree, we have an 'Anti Semi Apply' operator (equivalent to an anti-semijoin operator in relational theory). The left-hand child is a Cartesian Product of two sets of tuples: (1) a set of nodes corresponding to (me:Person {name: "me"}) retrieved using the label and property index, and (2) a set of nodes corresponding to (e:Event), retrieved using the label index. We note that the Cartesian Product will consist of all combinations of "me" nodes with every (e:Event) in the database.

The Cartesian Product emits a single row at a time, and this row, R, is piped upwards to the Anti Semi Apply operator. The Anti Semi Apply then feeds R as an argument to its right-hand branch, so that R appears as the ultimate descendant on the right-hand side. R is passed upwards to an Expand operator, which returns all (me)-[:ATTENDED]-(some_e) rows (note that the "me" here matches the one in R). Immediately, each such row, S, has a Selection operation applied to it, in order to match "some_e" to the "e" in R. Thus, any row S is in effect an "Event" ATTEND by "me". S is then piped to Anti Semi Apply. If no row S was found for R, R is returned as a result (as these will be all “Events” not ATTENDED by "me")

Thus, having negation in a query is not treated as a simple filter - it intrinsically affects the entire query plan.

More details on Cypher’s execution plans can be found at http://neo4j.com/docs/snapshot/execution-plans.html.