Reading a parquet file in nodejs

2.5k views Asked by At

I am trying the following code (from sample of parquetjs-lite and stackoverflow) to read a parquet file in nodejs :

const readParquetFile = async () => {
try {
       // create new ParquetReader that reads from test.parquet
       let reader = await parquet.ParquetReader.openFile('test.parquet');
    }
catch (e){
    console.log(e); 
    throw e;
  }
 
// create a new cursor
let cursor = reader.getCursor();
 
// read all records from the file and print them
let record = null;
while (record = await cursor.next()) {
  console.log(record);
}

await reader.close();

  };

When I run this code nothing happens . There is nothing written to the console, for testing purpose I have only used a small csv file which I converted using python to parquet.

  1. Is it because I have converted from csv to parquet using python (I couldn't find any JS equivalent for large files on which I have to ultimately be able to use).
  2. I want my application to be able to take in any parquet file and read it. Is there any limitation for parquetjs-lite in this regard.
  3. There are NaN values in my CSV could that be a problem ?

Any pointers would be helpful.

Thanks

1

There are 1 answers

0
Deepak Poojari On BEST ANSWER

Possible failure cases are

you are calling this function in some file without a webserver running. In this case the file will run in async mode and as async function goes in callback stack and your main stack is empty the program will end and even is you have code in your call stack it will never run or log anything.

To solve this try running a webserver or better use sync calls

//app.js (without webserver)

const readParquetFile = async () => {
    console.log("running")
}
readParquetFile()
console.log("exit")

when you run the above code the output will be

exit

//syncApp.js

const readParquetFile = () => {
    console.log("running")
    // all function should be sync
}
readParquetFile()
console.log("exit")

here the console log will be

running
exit