unexpected end of while when zlib.gunzipping very large file from s3 bucket

270 views Asked by At

I'm loading a very large file from an AWS s3 bucket, creating a readStream out of it, gunzipping it, and then passing it through Papa.parse. When I say "very large file", I mean it's 245 MB gzipped, and 1.1 GB unzipped. Doing this with smaller files has always worked flawlessly, but with this excessively large file, it sometimes succeeds, but most often failed.

When it fails, zlib.createGunzip() throws an "end of file" error. (Apparently something it likes to do; I'm finding lots of references to this everywhere, but nothing that fits my case.)

Clearly it can succeed. Sometimes it does. I don't know what causes it to fail. Random memory shortage? Buffering where gunzip wants to read faster than the file can be loaded? I have no idea.

const file = await readS3File(s3options);

return new Promise(((resolve, reject) => {
    Papa.parse(file.createReadStream().pipe(zlib.createGunzip()), { ...papaParseOptions });

I'm looking for a way to increase my chances of success. Is there a good way to do that? Some way to pipe this through a buffer that will retry loading the file while appeasing gunzip somehow? Am I looking in the wrong direction?

0

There are 0 answers