Packages for reading parquets in NodeJS (2024)

35 views Asked by At

Creating a lambda in NodeJS that can parse parquet (version 2.0) files into JSON arrays. I have tried the following libraries which lead to failed results for various reasons:

  • parquetjs, parquets, parquetjs-lite, node-parquet: Not maintained and their final versions dont support parquet version 2.0.
  • @dnsp/parquetjs: Straight-up requires esModuleInterop for typescript projects. First package to require this.
  • duckdb: The total package size is literally 284MB making it impossible for serverless lambda deployments (Lambda deployments must be smaller then 250MB).

It really seems like any and all support for nodejs parquet parsing has been discontinued or requires super big hurdles to utilize. There must be some parquet parsing libraries in nodejs that are still supported and work well with typescript. Do any of y'all have suggestions or works of wisdom for this?

1

There are 1 answers

1
Carlo Piovesan On

duckdb-wasm npm module is big since it comprise also test and different deployments. Minimal stripped down version should be around 40MB uncompressed / 7.3 MB after compression.

duckdb module is also possibly an option.

Both duckdb AND duckdb-wasm use the same underlying library, only API is somewhat different AND there are different models (native on one side, Wasm-sandbox in the other). Both are in active development.