I am just getting started on Neo4J, and I am trying to load some data into Neo4j 3.1 using LOAD CSV with the following script:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///Fake59.csv" AS line
MERGE (person:Person {firstName: line.GivenName, middleInitial: line.MiddleInitial, lastName: line.Surname, title: line.Title,
gender: line.Gender, birthday: line.Birthday, bloodType: line.BloodType, weight: line.Pounds, height: line.FeetInches})
MERGE (contact:Contact {phoneNumber: line.TelephoneNumber, email: line.EmailAddress})
MERGE (person)-[:CONTACTED_AT]->(contact)
MERGE (color:Color {name: line.Color})
MERGE (person)-[:FAVORITE_COLOR]->(Color)
MERGE (address:Address {streetAddress: line.StreetAddress, city: line.City, zipCode: line.ZipCode})
MERGE (person)-[:LIVES_AT]->(address)
MERGE (state:State {abbr: line.State, name: line.StateFull})
MERGE (city)-[:STATE_OF]->(stage)
MERGE (country:Country {name: line.CountryFull, abbr: line.Country, code: line.TelephoneCountryCode})
MERGE (state)-[:IN_COUNTRY]->(country)
MERGE (credentials:Credentials {userName: line.Username, password: line.Password, GUID: line.GUID})
MERGE (person)-[:LOGS_in]->(credentials)
MERGE (browser:Browser {agent: line.BrowserUserAgent})
MERGE (person)-[:BROWSES_WITH]->(browser)
MERGE (creditCard:CreditCard {number: line.CCNumber, cvv2: line.CVV2, expireDate: line.CCExpires})
MERGE (person)-[:USES_CC]->(creditCard)
MERGE (creditCompany:CreditCompany {name: line.CCType})
MERGE (creditCard)-[:MANAGED_BY]->(creditCompany)
MERGE (occupation:Occupation {name: line.Occupation})
MERGE (person)-[:WORKS_AS]->(occupation)
MERGE (company:Company {name: line.Company})
MERGE (person)-[:WORKDS_FOR]->(company)
MERGE (company)-[:EMPLOYES]->(occupation)
MERGE (vehicle:Vehicle {name: line.Vehicle})
MERGE (person)-[:DRIVES]->(vehicle)
The input file has about 50k rows. It runs for a few hours the process does not finish, but after that time if I query the database I see that only the node type (Person) got created. If I run a smaller file with 3 entries only all the additional nodes and relationships are created.
I have already changed the amount of memory allocated to Neo4j and to the JVM, and still no success. I understand that MERGE takes longer than CREATE to be executed but I am trying to avoid duplication of nodes with the insert.
Any ideas or suggestions on what I should change or how I can improve this ?
Thank you,
--MD.
Try splitting your query into multiple smaller ones. Works better and is easier to manage. Also when using
MERGE
you should typically want to do it on a single property like an email for person or something unique and then useON CREATE SET
. Should fasten the query. Looks like this:In your case with the person where there is no single unique property you can use a combination of many, but know that every property you add in the
MERGE
slows down the query.