Basically I need to load a file within an archive into memory, but since the user is able to modify the contents of the archive it is very likely that the file offset will change.
So I need to create a function that searches the archive for a file with the help of a hex pattern, returns the file offset, loads the file into memory and returns the file address.
To load a file into memory and return the address I currently use this:
DWORD LoadBinary(char* filePath)
{
FILE *file = fopen(filePath, "rb");
long fileStart = ftell(file);
fseek(file, 0, SEEK_END);
long fileSize = ftell(file);
fseek(file, fileStart, 0);
BYTE *fileBuffer = new BYTE[fileSize];
fread(fileBuffer, fileSize, 1, file);
LPVOID newmem = VirtualAlloc(NULL, fileSize, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
memcpy(newmem, fileBuffer, fileSize);
delete[]fileBuffer;
fclose(file);
return (DWORD)newmem;
}
The archive is neither encrypted nor compressed, but it is pretty big (about 1 GB) and I'd like to not load the entire file into memory if possible.
I'm aware of the size of the file I'm looking for inside the archive so I don't need the function to find the end of the file with another pattern.
File Pattern: "\x30\x00\x00\x00\xA0\x10\x04\x00"
File Length: 4096 bytes
How can I realize this and what functions are needed?
Solution
The code is probably slow for large files, but this works for me since the file I'm looking for is at the beginning of the archive.
FILE *file = fopen("C:/data.bin", "rb");
fseek(file, 0, SEEK_END);
long fileSize = ftell(file);
rewind(file);
BYTE *buffer = new BYTE[4];
int b = 0; //bytes read
long offset = 0;
for (int i = 0; i < fileSize; i++)
{
int input = fgetc(file);
*(int *)((DWORD)buffer + b) = input;
if (b == 3)
{
b = 0;
}
else {
b = b + 1;
}
if (buffer[0] == 0xDE & buffer[1] == 0xAD & buffer[2] == 0xBE & buffer[3] == 0xEF)
{
offset = (ftell(file) - 4);
printf("Match @ 0x%08X", offset);
break;
}
}
fclose(file);
The principle is stated in this answer: you need a finite state machine (FSM) which takes file bytes one by one as input and compares current input with a byte from the pattern according to FSM state, which is an index in the pattern.
Here is the simplest, but naive solution template:
The
state
variable holds position in the pattern, and it is the state of the FSM.While this solution satisfies your needs, it is not optimal, because reading a byte from a file takes nearly the same time as reading a bigger block of, say, 512 bytes or even more. You can improve this yourself in two steps:
fread()
. Note what calculation of pattern location (after it is found) becomes a bit more complicated, becauseftell()
no more matches theinput
location.