I have to log huge amount of data in a CSV file with each row having 5 elements. I have used a large buffer to store the rows and then flush it in one shot using fwrite(...)
when it gets filled and repeat till needed. The following is a snippet of logging function:
void logInFile(int a, int b, int c, int d, int e)
{
sprintf(rowInLog,"%d,%d,%d,%d,%d\n",a,b,c,d,e);
int bytesInRow = strlen(rowInLog);
if(bytesInRow + bytesUsedInBuffer <= sizeOfBuffer)
{
strcat(buffer, rowInLog);
bytesUsedInBuffer += bytesInRow;
}
else
{
printf("flushing file to disk\n");
fwrite(buffer, bytesUsedInBuffer, 1, fp);
memset(buffer, 0, sizeOfBuffer);
bytesUsedInBuffer = 0;
strcat(buffer, rowInLog);
bytesUsedInBuffer += bytesInRow;
}
}
But this is making the execution a lot slow and it is not because of flushing because the message "flushing file to disk" is not printed on screen. Without any call to this logging function the whole program executes in minutes but along with this, it did not complete even in 2 hours. Is there some other fundamental flaw?
Your answer I suspect is right here:
The
strcat()
function will scan the entirebuffer
to find the end ever time you call it. Ifbuffer
is large and mostly full, that could be quite slow. The behavior is roughly O(N2) in the size ofbuffer
. Your performance will degrade rapidly as you increase the size of your buffer. That's pretty much the opposite of what you want from your buffer. (Edit: You mentioned in a comment that your buffer is 1GB. I would expect the above code to be very, very slow as that buffer fills.)However, you already know exactly where the end is, and how many bytes to copy. So do this instead:
Note that I had
memcpy
copy one extra byte, so that it puts the NUL terminator on the buffer, just in case you have any other strXXX functions laying around that operate onbuffer
. If you don't, you can safely remove the+ 1
above.A similar, less egregious error occurs in your
else
clause. You can replace that also with amemcpy
:You can also save a little bit of time by combining these statements:
The
sprintf
returns the length of the output string, so you could simply say:That wasn't the main performance issue in your code, but changing that will improve it further.
EDIT: An even better, alternate approach:
If you want to eliminate the
memcpy()
entirely, consider this alternate approach:Set
maximumLineSize
to a reasonable value for your row of 5 integers, such as 60. (10 bytes for each integer including sign plus 5 bytes for commas and newline is 55, so 60 is a nice round number above that.)