neo4j load csv is very slow

1.5k views Asked by Sonu Kapoor At 22 January 2016 at 18:08

I am trying to load a CSV file into my Neo4j database hosted on GrapheneDB. It worked fine on the first file with 5000 rows. It took about 16 seconds to complete this file.

I am now importing the second file with the same schema and same amount of rows. The data is just different. The Cypher query has been running over 30 minutes and its still not complete. I am not sure what its doing and why its so slow. Here is my cypher:

USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM 'http://example.com/some.csv' AS line
Match (c:Customer {customerID: line.CustomerID}) 
MERGE (c)<-[r:DEPT_OF]-(dept:Dept { name: line.Category})
ON CREATE 
SET dept.name = line.Category, dept.deptID=line.DeptID, dept.createdDTS=1453478149463
MERGE (dept)<-[r1:IN_DEPT]-(pt:ProductType {name: dept.name}) 
ON CREATE 
SET pt.name = dept.name, pt.packQty = line.PackQty, pt.createdDTS = 1453478149463, 
pt.productTypeID = line.ProductTypeID
MERGE (pt)<-[r2:OF_TYPE]-(st:Style {name: line.Style})
ON CREATE 
SET st.name = line.Style, st.styleID = line.StyleID, st.styleNum = line.StyleNo, st.price = line.Price
MERGE (st)<-[r3:OF_STYLE]-(p:Product {productNum: line.UPC})
ON CREATE 
SET p.floorMin = line.MinFloor, p.floorMax = line.FloorMax, p.color = line.Color, p.createdDTS = 1453478149463,
p.size = line.Size, p.productID = line.ProductID;

For rows from my csv:

UPC,Category,Style,StyleNo,Color,Size,MinFloor,MaxFloor,Price,ProductID,CustomerID,ProductType,PackQty,DeptID,StyleID,ProductTypeID,ProductID
33383605005,FRESH VEGETABLES,GREEN ONIONS 24/10 OZ,NA,NA,NA,0,0,1.79,,5f795a69-47cb-49c8-a334-0cf5d67be423,FRESH VEGETABLES,1,538a02c6-b6b7-4d0d-8dca-5ff3a513d59e,3e08dabb-415a-4826-86e8-44efb9813892,cc6a0f3c-1c05-44a0-b603-37cbbb60954e,3324b2b1-954a-4547-a82d-553be66d7b54
52867010005,FRESH VEGETABLES,GREEN ONIONS 24/10 OZ,NA,NA,NA,0,0,1.79,,5f795a69-47cb-49c8-a334-0cf5d67be423,FRESH VEGETABLES,1,edfa998f-3749-4d1f-bd96-3f4a5db0de67,fb11a8e5-de49-44da-924a-9ebc5f7f01d2,d47fd5d8-dbf0-4110-b701-543e6ed0ae40,28a9d206-96c6-4fe4-b528-84e446ba3c16

Update 1:

I have added the following indexes based off the response from Nicole.

CREATE INDEX ON :Customer(customerID)
CREATE INDEX ON :Dept(name)
CREATE INDEX ON :ProductType(name)
CREATE INDEX ON :Style(name)
CREATE INDEX ON :Product(productNum)

This helped a lot, but still takes about 20 seconds for 5K rows. Is that normal?

Any help is appreciated.

Update 2:

Based off the response from @michael, I researched a little further and found the following article very useful:

http://graphaware.com/neo4j/2014/07/31/cypher-merge-explained.html

Update 3:

I have updated my cypher to the following to avoid duplicates. I hope this looks right?

USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM '...' AS line
Match (c:Customer {customerID: line.CustomerID}) 
MERGE (c)<-[r:DEPT_OF]-(dept:Dept { name: line.Category })
ON CREATE 
SET dept.name = dept.name, dept.deptID=line.categoryID, dept.createdDTS=1453742532269, dept.modifiedDTS = 1453742532269
MERGE (c)<-[r22:DEPT_OF]-(dept)
MERGE (dept)<-[r1:IN_DEPT]-(pt:ProductType {name: dept.name}) 
ON CREATE 
SET pt.name = dept.name, pt.packQty = line.PackQty, pt.createdDTS = 1453742532269, pt.productTypeID = line.ProductTypeID, pt.modifiedDTS = 1453742532269
MERGE (c)<-[r2:DEPT_OF]-(dept)
MERGE (dept)<-[r3:IN_DEPT]-(pt) 
MERGE (pt)<-[r4:OF_TYPE]-(st:Style {name: line.Style})
ON CREATE 
SET st.name = line.Style, st.styleID = line.StyleID, st.styleNum = line.StyleNo, st.price = line.Price, st.modifiedDTS = 1453742532269, st.createdDTS = 1453742532269
MERGE (c)<-[r5:DEPT_OF]-(dept)
MERGE (dept)<-[r6:IN_DEPT]-(pt) 
MERGE (pt)<-[r7:OF_TYPE]-(st)
MERGE (st)<-[r8:OF_STYLE]-(p:Product {productNum: line.UPC})
ON CREATE 
SET p.floorMin = line.MinFloor, p.floorMax = line.FloorMax, p.color = line.Color, p.createdDTS = 1453742532269,p.modifiedDTS = 1453742532269, p.size = line.Size, p.productID = line.ProductID;

Original Q&A

TechQA.

neo4j load csv is very slow

There are 0 answers

Related Questions in NEO4J

Related Questions in CYPHER

Related Questions in LOAD-CSV

Related Questions in GRAPHENEDB

Popular Questions

Popular Tags

Trending Questions