How can I read a binary file bit-by-bit?

114 views Asked by At

I'm working on a project to compress/decompress a text file with the Huffman algorithm. I've successfully managed to compress a given text file into binary -- I've checked with the Visual Studio Hex Editor extension and it works correctly.

However, I don't really know how to go about implementing decompression. What I'd like to do is read the binary file bit-by-bit, traverse the corresponding Huffman tree, and write a character when a leaf is reached. However, I'm not sure how to go about reading a binary file bit-by-bit (I know that it can't be done directly as is: of course I had to do bit manipulation to implement compression in the first place).

I've tried using the fread() function to store the bits into a buffer, but frankly, I think that I'm misunderstanding how it works, and I'm not sure how much memory to give the buffer, nor how to retrieve the bits from the buffer, which is a char array.

Thanks in advance for any help.

1

There are 1 answers

0
chqrlie On BEST ANSWER

Here is a simple read wrapper using an int buffer. It reads bits from the stream one at a time, starting from the least significant bit of the first byte:

// Read the next bit from the stream.
// `buffer` holds the remaining bits from the last byte read
// initial value of `buffer` should be `0`.
// Return the bit value or `EOF` at end of file
int read_next_bit(FILE *stream, int *buffer)
{
    int bits = *buffer;
    if (bits <= 1) {
        // either no bits left or EOF reached already
        if (bits == EOF) {
            return EOF;
        }
        // get the next byte from the stream
        bits = getc(stream);
        if (bits == EOF) {
            // if EOF, make it sticky and return it
            return *buffer = EOF;
        }
        // set the guard bit
        bits |= 0x100;
    }
    // shift the current bit out of the buffer
    *buffer = bits >> 1;
    // and return it.
    return bits & 1;
}

This function tries to minimize the number of memory reads, writes and the number of tests.

Here is an alternative that uses a structure to encapsulate the bitstream with an array to bufferize the file contents.

#include <stdio.h>
#include <stdbool.h>

typedef struct bitstream {
    FILE *stream;
    int pos, len;
    unsigned char buf[BUFSIZ];
} bitstream;

#define LSB_FIRST  1
#define MSB_FIRST  2
#define BIT_ORDER  LSB_FIRST  // define as appropriate

// open a bitstream, return a boolean success indicator
bool bitstream_open(bitstream *bt, const char *filename) {
    bt->len = bt->pos = 0;
    bt->stream = fopen(filename, "rb");
    return bt->stream != NULL;
}

// close a bitstream
void bitstream_close(bitstream *bt) {
    if (bt->stream) {
        bt->len = bt->pos = 0;
        fclose(bt->stream);
        bt->stream = NULL;
    }
}

// read a bit from a bitstream:
// return EOF at end of file
//   otherwise return the next bit
int bitstream_read(bitstream *bt) {
    int pos = bt->pos;
    if (pos >= bt->len) {
        bt->pos = pos = 0;
        bt->len = fread(bt->buf, sizeof bt->buf, bt->stream) * 8;
        if (bt->len == 0)
            return EOF;
    }
    bt->pos++;
#if BIT_ORDER == LSB_FIRST
    return (buf[pos >> 3] >> (pos & 7)) & 1;
#else
    return (buf[pos >> 3] >> (7 - (pos & 7))) & 1;
#endif
}