C Reading binary data into struct

109 views Asked by At

I have a struct

typedef struct {
    uint8_t type;  // 1B -> 1B
    uint16_t hash; // 2B -> 3B
    uint16_t id;   // 2B -> 5B
    uint32_t ip;   // 4B -> 9B
    uint16_t port; // 2B -> 11B
} Data;

and some binary data (which is a stored instance of Data on disk)

const unsigned char blob[11] = { 0x00, 0x00, 0x7b, 0x00, 0xea, 0x00, 0x00, 0x00, 0x59, 0x01, 0x00 };

I want to "read" the blob into my struct, the first byte 0x00 corresponds to type, the second and third byte 0x00, 0x7b correspond to hash, etc.

I can't just do Data *data = (Data *)blob, since the actual size of Data will probably be bigger than 11 Bytes (Faster RAM access or something. Not relevant here.) The point is sizeof(Data) == 16 and the representation in RAM may be different than the compact one on disk.

So how can I "import" my blob into a Data struct without having to use memcpy for every attribute? Aka what's nicest/simplest solution for this in C?

3

There are 3 answers

0
John Bollinger On

The point is sizeof(Data) == 16 and the representation in RAM may be different than the compact one on disk.

Since you cannot rely on the data layout in the file to match that of the in-memory structure, standard C does not provide an alternative to working member by member.

But reading the data from disk is potentially a different question from reading it from an array. I suppose you imagine reading one or more whole raw records into memory and then copying from there in some way, but if you can rely on the sizes and endianness of the individual fields matching between structure and disk then you could consider this:

Data item;

if (fread(&item.type, sizeof(item.type), 1, input_file) == 0) handle_error();
if (fread(&item.hash, sizeof(item.hash), 1, input_file) == 0) handle_error();
if (fread(&item.ip,   sizeof(item.ip),   1, input_file) == 0) handle_error();
if (fread(&item.id,   sizeof(item.id),   1, input_file) == 0) handle_error();
if (fread(&item.port, sizeof(item.port), 1, input_file) == 0) handle_error();

That lets the stream handle the buffering (which it will, unless you disable that), relieves you of counting bytes, and is pretty clear. Five calls to fread() might be a bit more expensive than five to memcpy(), but you're unlikely to notice the difference next to the cost of opening the file and transfering data from it.

If you do need to populate the structure from an in-memory array containing raw bytes from the file, however, then per-member memcpy() is the most portable way. And quite possibly more efficient than you think.

2
usef On

Assuming the right sided bytes in the blob array correspond to a higher weight than the left sided bytes, then, a simple solution would be using bitwise operators in the following way:

void importFromBlob(const unsigned char *blob, Data *data) {
    // Import type
    data->type = blob[0];

    // Import hash 
    data->hash = (blob[1] << 8) | blob[2];  //assembled as 0x007b in your exp

    // Import id 
    data->id = (blob[3] << 8) | blob[4];

    // Import ip 
    data->ip = (uint32_t)(blob[5]) | (uint32_t)(blob[6] << 8) | (uint32_t)(blob[7] << 16) | (uint32_t)(blob[8] << 24);

    // Import port
    data->port = (blob[9] << 8) | blob[10];
}
1
chqrlie On

The simplest way to avoid multiple reads or byte copies for this particular structure is to pad the structure explicitly with 3 initial bytes and 2 trailing bytes:

typedef struct {
    uint8_t pad0[3];  // 3 bytes at offset 0, unused
    uint8_t type;     // 1 byte  at offset 3
    uint16_t hash;    // 2 bytes at offset 4
    uint16_t id;      // 2 bytes at offset 6
    uint32_t ip;      // 4 bytes at offset 8
    uint16_t port;    // 2 bytes at offset 12
    uint8_t pad1[2];  // 2 bytes at offset 14, unused. total: 16 bytes
} Data;

You would read the data with a single fread:

    Data mydata;
    if (fread(&mydata.type, 1, 11, fp) == 11) {
        // mydata was read successfully
        // fields can be used directly assuming correct endianness.
    } else {
        // read error
    }

Copying from the memory blob is also a single call to memcpy:

    const unsigned char blob[11] = {
        0x00, 0x00, 0x7b, 0x00, 0xea, 0x00, 0x00, 0x00, 0x59, 0x01, 0x00
    };
    memcpy(&mydata.type, bloc, 11);

Reading the binary data requires opening the file in binary mode "rb", "wb"...

Writing the data in a single fwrite is done with

    if (fwrite(&mydata.type, 1, 11, fp) == 11) ...

This trick works for the structure in the question, but might not work in many other cases:

  • if the endianness in the file differs from the CPUs,
  • if the the sequence of items larger than one byte is not favorable.

So in the general case, you may have to use memcpy to copy chunks of the compact byte oriented representation to the structure used ni memory, adjusting for potential endianness differences. memcpy with small fixed sizes is usually expanded inline efficiently generating very few instructions as can be verified on using the Godbolt Compiler Explorer.