Search for file in archive and load it into memory

351 views Asked by At

Basically I need to load a file within an archive into memory, but since the user is able to modify the contents of the archive it is very likely that the file offset will change.

So I need to create a function that searches the archive for a file with the help of a hex pattern, returns the file offset, loads the file into memory and returns the file address.

To load a file into memory and return the address I currently use this:

DWORD LoadBinary(char* filePath)
{
    FILE *file = fopen(filePath, "rb");
    long fileStart = ftell(file);
    fseek(file, 0, SEEK_END);
    long fileSize = ftell(file);
    fseek(file, fileStart, 0);
    BYTE *fileBuffer = new BYTE[fileSize];
    fread(fileBuffer, fileSize, 1, file);
    LPVOID newmem = VirtualAlloc(NULL, fileSize, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    memcpy(newmem, fileBuffer, fileSize);
    delete[]fileBuffer;
    fclose(file);
    return (DWORD)newmem;
}

The archive is neither encrypted nor compressed, but it is pretty big (about 1 GB) and I'd like to not load the entire file into memory if possible.

I'm aware of the size of the file I'm looking for inside the archive so I don't need the function to find the end of the file with another pattern.

File Pattern: "\x30\x00\x00\x00\xA0\x10\x04\x00"

File Length: 4096 bytes

How can I realize this and what functions are needed?

Solution

The code is probably slow for large files, but this works for me since the file I'm looking for is at the beginning of the archive.

FILE *file = fopen("C:/data.bin", "rb");
fseek(file, 0, SEEK_END);
long fileSize = ftell(file);
rewind(file);

BYTE *buffer = new BYTE[4];
int b = 0; //bytes read
long offset = 0;

for (int i = 0; i < fileSize; i++)
{
    int input = fgetc(file);

    *(int *)((DWORD)buffer + b) = input;

    if (b == 3)
    {
        b = 0;
    }
    else {
        b = b + 1;
    }

    if (buffer[0] == 0xDE & buffer[1] == 0xAD & buffer[2] == 0xBE & buffer[3] == 0xEF)
    {
        offset = (ftell(file) - 4);
        printf("Match @ 0x%08X", offset);
        break;
    }
}
fclose(file);
1

There are 1 answers

1
PlushBeaver On BEST ANSWER

The principle is stated in this answer: you need a finite state machine (FSM) which takes file bytes one by one as input and compares current input with a byte from the pattern according to FSM state, which is an index in the pattern.

Here is the simplest, but naive solution template:

FILE *file = fopen(path, "rb");
size_t state = 0;
for (int input_result; (input_result = fgetc(file)) != EOF;) {
    char input = (char)input_result;
    if (input == pattern[state]) {
        ++state;
    } else {
        state = 0;
    }
    if (pattern_index == pattern_size) {
        // Pattern is found at (ftell(file) - pattern_size).
        break;
    }
}
fclose(file);

The state variable holds position in the pattern, and it is the state of the FSM.

While this solution satisfies your needs, it is not optimal, because reading a byte from a file takes nearly the same time as reading a bigger block of, say, 512 bytes or even more. You can improve this yourself in two steps:

  1. Each iteration read a block, not a single character. Use fread(). Note what calculation of pattern location (after it is found) becomes a bit more complicated, because ftell() no more matches the input location.
  2. Add an inner loop to iterate through the block you've just read. Deal with input characters the same way as before—this is where FSM approach proves itself useful.