I have a relational database and I converted tables to csv files. I imported 2 of them, and create the nodes by specifying the columns to be picked as in following code:
import csv
from py2neo import neo4j, authenticate, Graph, Node, cypher, rel, Relationship
authenticate("localhost:7474", "neo4j", "my_password")
graph_db = Graph()
graph_db.delete_all()
"""import all rows and columns of csv files"""
with open('File1.csv', "rb") as abc_file, open('File2.csv', "rb") as efg_file:
data1 = csv.reader(abc_file, delimiter=';')
data2 = csv.reader(efg_file, delimiter=';')
data1.next()
data2.next()
"""Create the nodes for the all the rows of "Contact Email" column of abc_file"""
rownum = 0
for row in abc_file:
nodes1 = Node("Contact_Email", email=row[0])
contact_graph = graph_db.create(nodes1)
"""Create the nodes for the all the rows of "Building_Name" and "Person_Created"
columns of efg_file"""
rownum = 0
for row in efg_file:
nodes2 = Node("Building_Name", name=row[0])
nodes3 = Node("Person_Created", name=row[1])
building_graph = graph_db.create(nodes2, nodes3)
Let's say there are 60 emails under "Contact_Email" column of "File1.csv" which is the Primary_Key. It is used as Foreign_Key in "File2.csv" under "Person_Created" column. There 14 buildings specified under "Building Name" with corresponding emails in "Person_Created" columns. My Question is:
1) How can I match the 14 emails in File2.csv "Person_Created" column with the emails in File1.csv "Contact Email" column to avoid duplicates
2) and How can I create a relationship between the "Building Names" (in File2.csv) and "Person_Created" (in File1.csv) without any duplication.. sth like "Building1234 is DESIGNED_BY [email protected]"
How can I do it in py2neo with/without cypher?
Py2neo provides a number of uniqueness functions for just this. Have a look at this page to see
merge_one
and friends. The node values returned from this can then be stored and used for being unique relationships and paths.Note that for higher performance though, you'll probably want to look at Cypher transactions or batches. Without these, each action will require a call to the server and at scale, this is slow.