I'm looking for some advice on how to decode and read a zstd file and I'm feeling a bit lost since it is my first big project since I started learning Rust.
I am using Rust for this project since it is for an internship and the data export/compression tool was written in Rust long ago so I thought I can take some inspiration. I am learning Rust from scratch so I am not very familiar with the structs and functions of the file i/o processes. I have a code snippet which is not working currently so I have some questions:
use std::fs::File;
use std::io::{self, BufReader};
use zstd::stream::read::Decoder;
fn read_lines<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>>
where P: AsRef<Path>, {
if let Ok(file) = File::open(filename) {
if let Ok(buf_reader) = BufReader::new(file) {
if let Ok(decoder) = Decoder::new(buf_reader) {
return Ok(io::BufReader::new(decoder).lines()); } } } }
if let Ok(lines) = read_lines(filename) {
for line in lines {
if let Ok(ip) = line {
println!("{}", ip)
}
}
}
Since it is a compressed file, should I decode it first as a whole and then start reading line by line? I know that the decompressed files are in jsonl format so each line is a separate json file. If the file size is too big to read it in one go, how should I proceed?
Also, if you have another package than zstd you are using that you would recommend, please share it with me. I would appreciate all the help.
You're going about it the right way, using the
Decoderin aBufReaderwill allow you to read lines from the compressed file without requiring the whole file to be loaded up-front. The outerBufReaderyou use to read the lines will read chunks from the decoder until a newline is reached, and reading from the decoder will decode in chunks from the file.You just haven't got the structure and return type correct. Here's what I would do:
To explain a bit more:
File::openandDecoder::newboth returnstd::io::Errorif a problem is encountered, we can use?to return the error early and avoid nestedif-lets.Decoder::newtakes in a reader type and creates aDecoder<'_, BufReader<_>>(i.e. it creates aBufReaderfor theFileitself) so we don't have to do that part.Result<Lines<impl BufRead>, IoError>in this instance to keep it concise.