Efficient way of file logging

1k views Asked by At

I have to log huge amount of data in a CSV file with each row having 5 elements. I have used a large buffer to store the rows and then flush it in one shot using fwrite(...) when it gets filled and repeat till needed. The following is a snippet of logging function:

void logInFile(int a, int b, int c, int d, int e)
{    
    sprintf(rowInLog,"%d,%d,%d,%d,%d\n",a,b,c,d,e); 
    int bytesInRow = strlen(rowInLog);
    if(bytesInRow + bytesUsedInBuffer <= sizeOfBuffer)
    {
        strcat(buffer, rowInLog);
        bytesUsedInBuffer += bytesInRow;
    }
    else
    {
        printf("flushing file to disk\n");
        fwrite(buffer, bytesUsedInBuffer, 1, fp);
        memset(buffer, 0, sizeOfBuffer);
        bytesUsedInBuffer = 0;
        strcat(buffer, rowInLog);
        bytesUsedInBuffer += bytesInRow;
    }
}

But this is making the execution a lot slow and it is not because of flushing because the message "flushing file to disk" is not printed on screen. Without any call to this logging function the whole program executes in minutes but along with this, it did not complete even in 2 hours. Is there some other fundamental flaw?

2

There are 2 answers

0
Joe Z On BEST ANSWER

Your answer I suspect is right here:

if(bytesInRow + bytesUsedInBuffer <= sizeOfBuffer)
{
    strcat(buffer, rowInLog);  // <--- riiiight here.
    bytesUsedInBuffer += bytesInRow;
}

The strcat() function will scan the entire buffer to find the end ever time you call it. If buffer is large and mostly full, that could be quite slow. The behavior is roughly O(N2) in the size of buffer. Your performance will degrade rapidly as you increase the size of your buffer. That's pretty much the opposite of what you want from your buffer. (Edit: You mentioned in a comment that your buffer is 1GB. I would expect the above code to be very, very slow as that buffer fills.)

However, you already know exactly where the end is, and how many bytes to copy. So do this instead:

if(bytesInRow + bytesUsedInBuffer <= sizeOfBuffer)
{
    memcpy(buffer + bytesUsedInBuffer, rowInLog, bytesInRow + 1);
    bytesUsedInBuffer += bytesInRow;
}

Note that I had memcpy copy one extra byte, so that it puts the NUL terminator on the buffer, just in case you have any other strXXX functions laying around that operate on buffer. If you don't, you can safely remove the + 1 above.

A similar, less egregious error occurs in your else clause. You can replace that also with a memcpy:

    printf("flushing file to disk\n");
    fwrite(buffer, bytesUsedInBuffer, 1, fp);
    memcpy(buffer, rowInLog, bytesInRow + 1);
    bytesUsedInBuffer = bytesInRow;

You can also save a little bit of time by combining these statements:

sprintf(rowInLog,"%d,%d,%d,%d,%d\n",a,b,c,d,e); 
int bytesInRow = strlen(rowInLog);

The sprintf returns the length of the output string, so you could simply say:

int bytesInRow = sprintf(rowInLog,"%d,%d,%d,%d,%d\n",a,b,c,d,e); 

That wasn't the main performance issue in your code, but changing that will improve it further.


EDIT: An even better, alternate approach:

If you want to eliminate the memcpy() entirely, consider this alternate approach:

bytesUsedInBuffer += snprintf( buffer + bytesUsedInBuffer, maximumLineSize, 
                               "%d,%d,%d,%d,%d\n", a,b,c,d,e );

if (bytesUsedInBuffer >= sizeOfBuffer - maximumLineSize )
{
    fwrite(buffer, bytesUsedInBuffer, 1, fp);
    bytesUsedInBuffer = 0;
}

Set maximumLineSize to a reasonable value for your row of 5 integers, such as 60. (10 bytes for each integer including sign plus 5 bytes for commas and newline is 55, so 60 is a nice round number above that.)

2
Dietmar Kühl On

You are computing the length of the entire string every time around! This means the entire and growing string needs to be shuffled through the processor. Doing so is roughly speaking a worst case scenario! You are much better off writing the string to a file once in a while. Additionally, you should keep track of the last write position and append the string right there:

size_t size = sprintf(rowInLog + rowInLogSize, "%d,%d,%d,%d,%d\n", a, b, c, d, e);
rowInLogSize += size;