I tried to load csv data into embedded neo4j database(v2.1.7、on windows). The csv file has 1,000,000 rows(1 Million). And data model is simple too. just as following:
csv data
"num1","num2","datatime"
"13931345724","18409958023","2014-12-31 12:00:00"
"13931345724","13710622859","2014-12-31 12:00:00"
"13931345724","18919875049","2014-12-31 12:00:00"
"13931345724","13460873081","2014-12-31 12:00:00"
...
loading cypher sql
USING PERIODIC COMMIT 5000
LOAD CSV FROM 'file:C:/tmpFiles/calls100w.csv' AS line FIELDTERMINATOR ','
WITH line
MERGE (n0:Phone:_Phone {phoneNumber : line[0]})
MERGE (n1:Phone:_Phone {phoneNumber : line[1]})
MERGE (n0)-[:CALL{callAt: line[2]}]->(n1)
And this took a long time and got this exception:
java.lang.OutOfMemoryError: GC overhead limit exceeded
I tried to add neo4j-wrapper.conf file in the database location folder. but it seemed to have no effect.
neo4j-wrapper.conf
wrapper.java.additional.1=-XX:-UseConcMarkSweepGC
wrapper.java.additional.1=-Xloggc:c:/neo4jdb/log/neo4j-gc.log
wrapper.java.initmemory=4096
wrapper.java.maxmemory=4096
And here was what showed in messages.log file
2015-06-09 09:55:00.513+0000 INFO [org.neo4j]: System memory information:
Total Physical memory: 7.70 GB
Free Physical memory: 1.99 GB
Committed virtual memory: 361.05 MB
Total swap space: 14.48 GB
Free swap space: 5.76 GB
2015-06-09 09:55:00.519+0000 INFO [org.neo4j]: JVM memory information:
Free memory: 135.42 MB
Total memory: 148.94 MB
Max memory: 1.71 GB
Garbage Collector: PS Scavenge: [PS Eden Space, PS Survivor Space]
Garbage Collector: PS MarkSweep: [PS Eden Space, PS Survivor Space, PS Old Gen, PS Perm Gen]
Memory Pool: Code Cache (Non-heap memory): committed=2.44 MB, used=833.00 kB, max=48.00 MB, threshold=0.00 B
Memory Pool: PS Eden Space (Heap memory): committed=61.63 MB, used=7.28 MB, max=647.06 MB, threshold=?
Memory Pool: PS Survivor Space (Heap memory): committed=5.13 MB, used=5.12 MB, max=5.13 MB, threshold=?
Memory Pool: PS Old Gen (Heap memory): committed=82.19 MB, used=1.12 MB, max=1.28 GB, threshold=0.00 B
Memory Pool: PS Perm Gen (Non-heap memory): committed=20.75 MB, used=12.78 MB, max=82.00 MB, threshold=0.00 B
On windows there is no conf/ folder in database location folder, so I made one and put neo4j-wrapper.conf in it. Have conf file been placed in the right place?
database location folder
C:\NEO4JDB
| index.db
| messages.log
| neo4j.properties
| neostore
| neostore.id
| neostore.labeltokenstore.db
| ....
+---conf
| neo4j-wrapper.conf
+---index
| lucene-store.db
| lucene.log.active
| ...
\---schema
+---...
Split it into 2 Imports:
You run into the problem where cypher creates an Eager pipe to assert correct separation, which causes all your CSV lines to be pulled in eagerly, rendering periodic commit effectless, see:
I presume you have an index/constraint on
:Phone(phoneNumber)
. If you split your query up into two parts it will work: