Incredibly Long load time for Load CSV in Neo4j Desktop

48 views Asked by At

Using straightforward Cypher to load data from a CSV and just create nodes.

Code is as follows:


    :auto LOAD CSV WITH HEADERS FROM 'file:///registrants.csv' AS row
    CALL {
        WITH row 
        MERGE (r:Registrant {row_wid: toInteger(row.ROW_WID)})
        ON CREATE SET
            r.row_wid = toInteger(row.ROW_WID), 
            r.w_insert_dt = row.W_INSERT_DT,
            r.w_update_dt = row.W.UPDATE_DT,
            r.email_address = row.EMAIL_ADDRESS,
            r.attendee_contact_wid = toInteger(row.ATTENDEE_CONTACT_WID),
            r.attendee_account_wid = toInteger(row.ATTENDEE_ACCOUNT_WID),
            r.reg_contact_wid = toInteger(row.REG_CONTACT_WID),
            r.reg_account_wid = toInteger(row.REG_ACCOUNT_WID),
            r.event_wid = toInteger(row.EVENT_WID),
            r.tkt1_wid = toInteger(row.TKT1_WID),
            r.tkt2_wid = toInteger(row.TKT2_WID),
            r.tkt3_wid = toInteger(row.TKT3_WID),
            r.tkt4_wid = toInteger(row.TKT4_WID),
            r.tkt5_wid = toInteger(row.TKT5_WID),
            r.tkt6_wid = toInteger(row.TKT6_WID),
            r.current_flg = row.CURRENT_FLG,
            r.delete_flg = row.DELETE_FLG,
            r.created_on_dt = row.CREATED_ON_DT,
            r.updated_on_dt = row.UPDATED_ON_DT,
            r.reg_dt = row.REG_DT,
            r.attend_dt = row.ATTEND_DT,
            r.cancel_dt = row.CANCEL_DT,
            r.alumni = row.ALUMNI,
            r.reg_channel = row.REG_CHANNEL
            
    } IN TRANSACTIONS of 1000 ROWS

Did this with 100 rows and it worked seamlessly. Trying to create with 700K rows and it has been running over 12 hours.

I also have an index for creation of this node in the DB.

I'm a newbie so please excuse if I'm doing something wrong.

From my research this looks right.

Not getting any errors.

Insights appreciated

Thank you.

1

There are 1 answers

3
William Lyon On

Make sure you have a uniqueness constraint on Registrant.row_wid:

CREATE CONSTRAINT FOR (r:Registraint) REQUIRE r.row_wid IS UNIQUE;

Examine the query plan to make sure the index is being used by prepending EXPLAIN to the query and make sure there are no "eager" operations that would prevent batching (given the query I wouldn't expect there to be).

Increase the number of rows per transaction. It depends on how much memory is allocated for Neo4j transactions, but typically around 100k is what I use.