py2neo - Match and Merge two nodes coming from two different csv, and create relationship

1.3k views Asked by At

I have a relational database and I converted tables to csv files. I imported 2 of them, and create the nodes by specifying the columns to be picked as in following code:

import csv
from py2neo import neo4j, authenticate, Graph, Node, cypher, rel, Relationship
authenticate("localhost:7474", "neo4j", "my_password")
graph_db = Graph()
graph_db.delete_all()

"""import all rows and columns of csv files"""

with open('File1.csv', "rb") as abc_file, open('File2.csv', "rb") as efg_file:
data1 = csv.reader(abc_file, delimiter=';')
data2 = csv.reader(efg_file, delimiter=';')
data1.next()
data2.next()

"""Create the nodes for the all the rows of "Contact Email" column of abc_file"""
rownum = 0
for row in abc_file:
    nodes1 = Node("Contact_Email", email=row[0])
    contact_graph = graph_db.create(nodes1)

"""Create the nodes for the all the rows of "Building_Name" and "Person_Created" 
   columns of efg_file"""
rownum = 0
for row in efg_file:
    nodes2 = Node("Building_Name", name=row[0])
    nodes3 = Node("Person_Created", name=row[1])
    building_graph = graph_db.create(nodes2, nodes3)

Let's say there are 60 emails under "Contact_Email" column of "File1.csv" which is the Primary_Key. It is used as Foreign_Key in "File2.csv" under "Person_Created" column. There 14 buildings specified under "Building Name" with corresponding emails in "Person_Created" columns. My Question is:

1) How can I match the 14 emails in File2.csv "Person_Created" column with the emails in File1.csv "Contact Email" column to avoid duplicates

2) and How can I create a relationship between the "Building Names" (in File2.csv) and "Person_Created" (in File1.csv) without any duplication.. sth like "Building1234 is DESIGNED_BY [email protected]"

How can I do it in py2neo with/without cypher?

2

There are 2 answers

0
Nigel Small On BEST ANSWER

Py2neo provides a number of uniqueness functions for just this. Have a look at this page to see merge_one and friends. The node values returned from this can then be stored and used for being unique relationships and paths.

Note that for higher performance though, you'll probably want to look at Cypher transactions or batches. Without these, each action will require a call to the server and at scale, this is slow.

1
David Sequeira On

Create an Index or a unique Constraint for the Contact Email.

Probably a good idea to name the attribute of your Node such as email.

While Iterating through the Person_Created, use the email foreign key value to create a node of Contact Email, with the attribute email.

Since the Index / constraint is in place, the Node will be conditionally be created

Also create the relationship between Person Created and Contact Email within this iteration.