I've seen lots of discussions about PapaParse and large files, but not one that solves my situation. Appreciate any advice you have.
Goals
- User uploads CSV from the client and then creates a map of fields (columns in their CSV to system fields)
- File is loaded to Amazon S3
- Process is kicked off on the server to grab the file from S3 and parse it, then process each row.
The whole process works, until I get to about 20,000 rows. Then I get:
FATAL ERROR: invalid table size Allocation failed - process out of memory
It seems like the memory crash happens when I try to grab the file from S3 and then store it locally via fs.writeFileSync
. I think I can stream the file from S3 via s3.getObject(params).createReadStream()
but that doesn't return rows, just chunks.
Here's my code as is. I would like to skip the fs.writeFileSync()
step and just read from S3, but when I try that via PapaParse I get []
and BabyParse does not accept files.
Can I get rows from the chunks being returned by s3.getObject(params).createReadStream()
and parse those?
S3.aws.getObject( getS3params, Meteor.bindEnvironment( function ( error, response ) {
if ( error ) {
console.log( 'getObject error:' );
console.log( error );
} else {
console.log( 'Got S3 object' );
let s3file = response.Body,
csvFile = 'path/to/file.csv',
writeFile = fs.writeFileSync( csvFile, s3file ), // write CSV to local server -- this seems really silly. Want to just read from S3
parsed = Baby.parseFiles( csvFile, { // Note: using BabyParse not PapaParse
header: true,
step: function ( results, parser ) {
let thisItem = results.data[0];
// process this row
}
}),
deleteFile = fs.unlinkSync( csvFile ); // remove local CSV
}
})); // end S3.getObject
Any ideas? Thanks!