I try to parse a binary file and extract different data structures from it. One can be an uint8 or int8 (also uint16, int16 ... till 64).
To have a most universal method, I read in the data from the given file pointer and save it in an uint8 array (buffer).
With my test, I assumed that a file content of 40 (in hex) should lead to a resulting integer 64. That's why my test method asserts this values to be shore about it. ** Unfortunately the uint8 array's content results always in a decimal int of 52.** I don't know why and tries various other ways to read in a specific amount of bytes and assign them to an integer variable. Is this a topic of endianess or something?
Thanks in advance, if someone can help :)
My read_int method:
int read_int(FILE * file,int n,bool is_signed) throw(){
assert(n>0);
uint8_t n_chars[n];
int result;
for (int i = 0; i < n; i++)
{
if(fread(&n_chars[i],sizeof(n_chars[i]),1,file)!=1){
std::cerr<< "fread() failed!\n";
throw new ReadOpFailed();
}
result*=255;
result+=n_chars[i];
}
std::cout<< "int read: "<<result<<"\n";
return result;
//-------------Some ideas that didn't work out either------------------
// std::stringstream ss;
// ss << std::hex << static_cast<int>(static_cast<unsigned char>(n_chars)); // Convert byte to hexadecimal string
// int result;
// ss >> result; // Parse the hexadecimal string to integer
// std::cout << "result" << result<<"\n";
One little test that tremendously fails... The part with the endian detection gives the output for little endian (don't know if this is anyhow a part of the problem).
struct TestContext{
FILE * create_test_file_hex(char * input_hex,const char * rel_file_path = "test.gguf") {
std::ofstream MyFile(rel_file_path, std::ios::binary);
// Write to the file
MyFile << input_hex;
// Close the file
MyFile.close();
// std::fstream outfile (rel_file_path,std::ios::trunc);
// char str[20] =
// outfile.write(str, 20);
// outfile.close();
FILE *file = fopen(rel_file_path,"rb");
try{
assert(file != nullptr);
}catch (int e){
std::cout << "file couldn't be opened due to exception n° "<<std::to_string(e)<<"\n";
ADD_FAILURE();
}
std::remove(rel_file_path); //remove file whilst open, to be able to use it, but delete it after the last pointer was deleted.
return file;
}
};
TEST(test_tool_functions, test_read_int){
int n = 1;
// little endian if true
if(*(char *)&n == 1) {std::cout<<"Little Endian Detected!!!\n";}
else{std::cout<<"Big Endian Detected!!!\n";}
std::string file_hex_content = "400A0E00080000016";
uint64_t should;
std::istringstream("40") >> std::hex >> should;
ASSERT_EQ(should,64);
uint64_t result = read_int(TestContext().create_test_file_hex(file_hex_content.data()),1,false);
ASSERT_EQ(result,should);
}
The root cause of the problem is that your
file_hex_contentconsists of ASCII character bytes (which form a human-readable hexadecimal string representation of a number), not of the bytes that would form a binary integer representation. Therefore it doesn’t start with a single byte0x40a.k.a.64but with a byte'4'(ASCII byte value52) followed by another byte'0'(ASCII value48). A single byte64(0x40) corresponds to the ASCII character'@'rather than two characters'4'and'0'.A small serialization example follows. As long as you serialize and deserialize on the same architecture and have no portability concerns, endianness is not a concern either.
The output from the program above, when executed on my little endian machine, looks like this:
As expected, the serialized string starts from the lowest order byte
0xefand ends with the highest order byte0xab. On a big endian platform, the second line would be ordered from highest to lowest order byte, i.e.ab cd 12 34 de ad be ef.