Processing large CSVs in Meteor JS with PapaParse

424 views Asked by At

I've seen lots of discussions about PapaParse and large files, but not one that solves my situation. Appreciate any advice you have.

Goals

  1. User uploads CSV from the client and then creates a map of fields (columns in their CSV to system fields)
  2. File is loaded to Amazon S3
  3. Process is kicked off on the server to grab the file from S3 and parse it, then process each row.

The whole process works, until I get to about 20,000 rows. Then I get:

FATAL ERROR: invalid table size Allocation failed - process out of memory

It seems like the memory crash happens when I try to grab the file from S3 and then store it locally via fs.writeFileSync. I think I can stream the file from S3 via s3.getObject(params).createReadStream() but that doesn't return rows, just chunks.

Here's my code as is. I would like to skip the fs.writeFileSync() step and just read from S3, but when I try that via PapaParse I get [] and BabyParse does not accept files.

Can I get rows from the chunks being returned by s3.getObject(params).createReadStream() and parse those?

S3.aws.getObject( getS3params, Meteor.bindEnvironment( function ( error, response ) {
  if ( error ) {
    console.log( 'getObject error:' );
    console.log( error );
  } else {
    console.log( 'Got S3 object' );

    let s3file      = response.Body,
        csvFile     = 'path/to/file.csv',
        writeFile   = fs.writeFileSync( csvFile, s3file ), // write CSV to local server -- this seems really silly. Want to just read from S3
        parsed      = Baby.parseFiles( csvFile, { // Note: using BabyParse not PapaParse
                        header: true,
                        step: function ( results, parser ) {
                          let thisItem = results.data[0];
                          // process this row
                        }
                      }),
        deleteFile  = fs.unlinkSync( csvFile ); // remove local CSV
  }
})); // end S3.getObject

Any ideas? Thanks!

0

There are 0 answers