Half of read buffer is corrupt when using ReadFile

Question

Half of read buffer is corrupt when using ReadFile

403 views Asked by kbaud At 03 December 2020 at 16:51

Half of the buffer used with ReadFile is corrupt. Regardless of the size of the buffer, half of it has the same corrupted character. I have look for anything that could be causing the read to stop early, etc. If I increase the size of the buffer, I see more of the file so it is not failing on a particular part of the file.

Visual Studio 2019. Windows 10.

#define MAXBUFFERSIZE 1024
DWORD bufferSize = MAXBUFFERSIZE;
_int64 fileRemaining;

HANDLE hFile;
DWORD  dwBytesRead = 0;
//OVERLAPPED ol = { 0 };
LARGE_INTEGER dwPosition;

TCHAR* buffer;

hFile = CreateFile(
    inputFilePath,         // file to open
    GENERIC_READ,          // open for reading
    FILE_SHARE_READ,       // share for reading
    NULL,                  // default security
    OPEN_EXISTING,         // existing file only
    FILE_ATTRIBUTE_NORMAL, // normal file    | FILE_FLAG_OVERLAPPED
    NULL);                 // no attr. template

if (hFile == INVALID_HANDLE_VALUE)
{
    DisplayErrorBox((LPWSTR)L"CreateFile");
    return 0;
}

LARGE_INTEGER size;
GetFileSizeEx(hFile, &size);

_int64 fileSize = (__int64)size.QuadPart;
double gigabytes = fileSize * 9.3132e-10;
sendToReportWindow(L"file size: %lld bytes \(%.1f gigabytes\)\n", fileSize, gigabytes);

if(fileSize > MAXBUFFERSIZE)
{
    buffer = new TCHAR[MAXBUFFERSIZE];
}
else
{
    buffer = new TCHAR[fileSize];
}
fileRemaining = fileSize;

sendToReportWindow(L"file remaining: %lld bytes\n", fileRemaining);

while (fileRemaining)                                       // outer loop. while file remaining, read file chunk to buffer
{
    sendToReportWindow(L"fileRemaining:%d\n", fileRemaining);

    if (bufferSize > fileRemaining)                         // as fileremaining gets smaller as file is processed, it eventually is smaller than the buffer
        bufferSize = fileRemaining;

    if (FALSE == ReadFile(hFile, buffer, bufferSize, &dwBytesRead, NULL))
    {
        sendToReportWindow(L"file read failed\n");
        CloseHandle(hFile);
        return 0;
    }

    fileRemaining -= bufferSize;

 // bunch of commented out code (verified that it does not cause the corruption)
}
delete [] buffer;

Debugger html view (512 byte buffer)

Debugger html view (1024 byte buffer). This shows that file is probably not the source of the corruption.

Misc notes: I have been told that memory mapping the file does not provide an advantage since I am sequentially processing the file. Another advantage to this method is that when I detect particular and reoccurring tags in the WARC file I can skip ahead ~500 bytes and resume processing. This improves speed.

Original Q&A

There are 1 answers

**Zeus** · Accepted Answer · 2020-12-05T03:16:28+00:00

The reason is that you use a buffer array of type TCHAR, and the size of TCHAR type is 2 bytes. So the bufferSize set when you call the ReadFile function is actually filled into the buffer array every 2 bytes.

But the actual size of the buffer is sizeof(TCHAR) * fileSize, so half of the buffer array you see is "corrupted"

TechQA.

Half of read buffer is corrupt when using ReadFile

There are 1 answers

Related Questions in C++

Related Questions in WINAPI

Related Questions in READFILE

Related Questions in WARC

Popular Questions

Popular Tags

Trending Questions