I managed to load a parquet file based on example & documentation of rust's apache::arrow implementation.
use parquet::arrow::{ParquetFileArrowReader, ArrowReader};
use std::rc::Rc;
use arrow::record_batch::RecordBatchReader;
let file = File::open(&Path::new("./path_to/file.parquet")).unwrap();
let file_reader = SerializedFileReader::new(file).unwrap();
let mut arrow_reader = ParquetFileArrowReader::new(Rc::new(file_reader));
println!("Converted arrow schema is: {}", arrow_reader.get_schema().unwrap());
let mut record_batch_reader = arrow_reader.get_record_reader(2048).unwrap();
I was able to display the name and type of columns of each batch:
loop {
let record_batch = record_batch_reader.next_batch().unwrap().unwrap();
if record_batch.num_rows() > 0 {
println!("Schema: {}.", record_batch.schema());
}
}
but I am quite confused on how to display the content of the columns. How can I retrieve the content of the first column and print it?
The last version of apache arrow seams to have a prettifyer class. Unfortunately this is not in the last available package (1.0.1).
The manual way to do it is through downcasting.
Then you can simply print it: