Steamlined way of reading a file with zig

Question

Steamlined way of reading a file with zig

125 views Asked by Hackerman At 23 March 2024 at 21:14

I have a file with a bunch of primes. For this purpose we can say that the contents of this file called "primes" are as follows:

2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,

I want to read the contents of this file (and do some other unrelated stuff later on).

My problem is that the way am reading the file seems to have boilerplate and also appears to be very amateur. This is what i have

const std = @import("std");
const print = std.debug.print;

pub fn main() void {
    print("This is the beginning of the program\n", .{});

    const file: std.fs.File = std.fs.cwd().openFile(
        "primes",
        .{}) catch | err | {
        print("En arror occured while opening file: {}\n", .{err});
        std.os.exit(1);
    };
    defer file.close();

    const reader = file.reader();
    var buff: [1024]u8 = undefined;

    while(reader.readUntilDelimiterOrEof(&buff, ',')) | number | {  // THIS IS A PROBLEM LINE
        if(number == null) break;                                   // AND SO IS THIS
        print("...running {s}\n", .{buff});
    } else | err | {
        print("En arror occured while opening file: {}\n", .{err});
    }

    print("Buffer: {s}\n", .{buff});
}

It seems redundant and almost dumb to unpack the value onto "number" while the buffer "buff" will also have the same contents, unless "number" is null (which will break the loop). I want to know if there is a more streamline way to do this..

I am avoiding using the return value of !void in the main function because I am trying to practice error handling, and want to handle them myself.

Thank you in advance

Original Q&A

There are 1 answers

**ad absurdum** · Accepted Answer · 2024-03-28T00:03:56+00:00

There are some fundamental problems in the posted code, and I don't think that it works as OP expects. There is some "boilerplate" associated with handling errors and potential null values, but that is one of the prices you pay for working in a systems programming language. You can have less such boilerplate in C code, but not in robustly written C code.

Problems in the Posted Code

The OP code is printing the value of buff, which is an array of u8, instead of the value of number, which is what is returned by readUntilDelimiterOrEof. While this function does read into the buffer, it returns a slice formed from the contents of the buffer based on the number of bytes read. You want the slice if you intend to print those contents as a string.

The trouble is, and I suspect that this is where OP ran into problems, readUntilDelimiterOrEof doesn't return a simple slice, but rather an error union with an optional slice. The return value must be unwrapped (twice) in order to make use of its value.

A while loop with payload capture unwraps the error union first, and OP code essentially handles this correctly, although the error message seems misplaced. The payload captured by the while loop is an optional type. OP code correctly checks this against null and breaks from the loop when null has been returned. But to print the value contained in the optional, the optional must be unwrapped. You can do this with .{number orelse unreachable} in place of .{buff} in the printing code, or you can use the shorthand version of this: .{number.?}.

The OP code prints the contents of the buffer buff in the final line of the program. This is bad for two reasons. First, buff is not a slice and shouldn't be printed as a string. Second, if an error was encountered in the while loop this line attempts to print the contents of a possibly uninitialized array. The presence of this line, together with the large size of buff, makes me wonder if the OP believes that buff somehow accumulates the results as they are read in the loop. It does not; readUntilDelimiterOrEof simply reads into the buffer starting from the beginning each time.

OP says that they want to "do some other unrelated stuff later on". It isn't clear what this unrelated stuff is, but presumably they want to do some other work with the data read from the file. As it is written, the OP code does not persist any of the data read from the file. In the next section I suggest one possible approach to this problem.

Here is the simplest correction to the posted code:

const std = @import("std");
const print = std.debug.print;

pub fn main() void {
    print("This is the beginning of the program\n", .{});

    const file: std.fs.File = std.fs.cwd().openFile("primes.dat", .{}) catch |err| {
        print("An error occured while opening file: {}\n", .{err});
        std.os.exit(1);
    };
    defer file.close();

    const reader = file.reader();
    var buff: [1024]u8 = undefined;

    while (reader.readUntilDelimiterOrEof(&buff, ',')) |maybe_number| {
        if (maybe_number == null) break;
        print("...running {s}\n", .{maybe_number.?});
    } else |err| {
        print("Read error: {}\n", .{err});
    }
}

You could write the loop in a slightly more verbose style with another payload capture to make the double unwrapping more explicit:

while (reader.readUntilDelimiterOrEof(&buff, ',')) |maybe_number| {
    if (maybe_number) |number| {
        print("...running {s}\n", .{number});
    } else break;
} else |err| {
    print("Read error: {}\n", .{err});
}

Reading the Whole File into a Buffer

It might be better to read the entire file into a buffer, and work with that buffer instead of taking one bit at a time from the file. This is probably a more robust approach, and probably more performant.

The program below reads the contents of a file into a dynamically allocated buffer, and then iterates over those contents with an iterator provided by splitAny. One virtue of using splitAny is that you can provide it with a slice of delimiters; any of these delimiters indicates where the contents of the buffer should be split. In the case of the OP data file, it would be good to split not only on commas, but on spaces and newlines as well.

const std = @import("std");

pub fn main() !void {
    // Get an allocator.
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();
    defer {
        _ = gpa.deinit();
    }

    // Open a file.
    const primes_file = try std.fs.cwd().openFile("primes.dat", .{});
    defer primes_file.close();

    // Read the file into a buffer.
    const stat = try primes_file.stat();
    const buffer = try primes_file.readToEndAlloc(allocator, stat.size);
    defer allocator.free(buffer);

    // Iterate over the buffer.
    var numbers = std.mem.splitAny(u8, buffer, " ,\r\n");
    while (numbers.next()) |number| {
        if (!std.mem.eql(u8, number, "")) {
            std.debug.print("{s}\n", .{number});
        }
    }
}

TechQA.

Steamlined way of reading a file with zig

There are 1 answers

Problems in the Posted Code

Reading the Whole File into a Buffer

Related Questions in READFILE

Related Questions in ZIG

Popular Questions

Trending Questions