fastest way to write integer to file in C

12.2k views Asked by At

I am doing a homework to programming in C. Bonus points are for quick writing to the file in there upload test system.

I am trying to write a number of lines each comprising three space delimited decimal integer strings and then '\n' in file. The problem is, that fprintf is too slow (their reference time is more or less 1/3 faster).

I have tried a lots of possibilities (everything is in one for loop). fprintf (too slow):

fprintf(f, "%d %d %d\n", a[i], b[i], c[i]);

converting to string and then put the string into it - even worse:

sprintf(buffer, "%d", a[i]); //or: _itoa(_itoa(a[i], buffer, 10);
fputs(buffer, f);
fputc(' ', f);

is there any quick way to write integer numbers to simple text file (.txt) (the last solution has time 220ms, reference is 140ms for you to picture the time)? I have been trying and googling as hell, but nothing is working. But if the time is this short, there has to be some way!

PS: The numbers are integers all the time, size is 4 bytes, all the time in format:

a0 b0 c0
a1 b1 c1
a2 b2 c2
a3 b3 c3
etc...

More info: When I send the solution, I send only two files: file.h and file.c. No main etc... so everything is in their optimization. The solution should be in commands/algorithm (even in the description of the problem is statement, that fprintf is too slow and we should try something else to speed things up).

Thank you for everything!

Edit: since you want the whole code, here it is:

void save(const str_t * const str, const char *name)
{
  FILE* f;
  int i;

  if(str->cnt == 0)
      return;

  f = fopen(name, "w");
  if(f == NULL)
      return;

  for(i = 0; i < str->cnt; i++)
  {
      fprintf(f, "%d %d %d\n", str->a[i], str->b[i], str->c[i]);
  }
  fclose(f);
}
3

There are 3 answers

11
Clifford On BEST ANSWER

You can reduce the overhead of file I/O by writing to the file in large blocks to reduce the number of individual write operations.

#define CHUNK_SIZE 4096
char file_buffer[CHUNK_SIZE + 64] ;    // 4Kb buffer, plus enough 
                                       // for at least one one line
int buffer_count = 0 ;
int i = 0 ;

while( i < cnt )
{
    buffer_count += sprintf( &file_buffer[buffer_count], "%d %d %d\n", a[i], b[i], c[i] ) ;
    i++ ;

    // if the chunk is big enough, write it.
    if( buffer_count >= CHUNK_SIZE )
    {
        fwrite( file_buffer, buffer_count, 1, f ) ;
        buffer_count = 0 ;
    }
}

// Write remainder
if( buffer_count > 0 )
{
    fwrite( file_buffer, buffer_count, 1, f ) ;    
}

There may be some advantage in writing exactly 4096 bytes (or some other power of two) in a single write, but that is largely file-system dependent and the code to do that becomes a little more complicated:

#define CHUNK_SIZE 4096
char file_buffer[CHUNK_SIZE + 64] ;
int buffer_count = 0 ;
int i = 0 ;

while( i < cnt )
{
    buffer_count += sprintf( &file_buffer[buffer_count], "%d %d %d\n", a[i], b[i], c[i] ) ;
    i++ ;

    // if the chunk is big enough, write it.
    if( buffer_count >= CHUNK_SIZE )
    {
        fwrite( file_buffer, CHUNK_SIZE, 1, f ) ;
        buffer_count -= CHUNK_SIZE ;
        memcpy( file_buffer, &file_buffer[CHUNK_SIZE], buffer_count ) ;
    }
}

// Write remainder
if( buffer_count > 0 )
{
    fwrite( file_buffer, 1, buffer_count, f ) ;    
}

You might experiment with different values for CHUNK_SIZE - larger may be optimal, or you may find that it makes little difference. I suggest at least 512 bytes.


Test results:

Using VC++ 2015, on the following platform:

enter image description here

With a Seagate ST1000DM003 1TB 64MB Cache SATA 6.0Gb/s Hard Drive.

The results for a single test writing 100000 lines is very variable as you might expect on a desktop system running multiple processes sharing the same hard drive, so I ran the tests 100 times each and selected the minimum time result (as can bee seen in the code below the results):

Using default "Debug" build settings (4K blocks):

line_by_line: 0.195000 seconds
block_write1: 0.154000 seconds
block_write2: 0.143000 seconds

Using default "Release" build settings (4K blocks):

line_by_line: 0.067000 seconds
block_write1: 0.037000 seconds
block_write2: 0.036000 seconds

Optimisation had a proportionally similar effect on all three implementations, the fixed size chunk write was marginally faster then the "ragged" chunk.

When 32K blocks were used the performance was only slightly higher and the difference between the fixed and ragged versions negligible:

Using default "Release" build settings (32K blocks):

block_write1: 0.036000 seconds
block_write2: 0.036000 seconds

Using 512 byte blocks was not measurably differnt from 4K blocks:

Using default "Release" build settings (512 byte blocks):

block_write1: 0.036000 seconds
block_write2: 0.037000 seconds

All the above were 32bit (x86) builds. Building 64 bit code (x64) yielded interesting results:

Using default "Release" build settings (4K blocks)- 64-bit code:

line_by_line: 0.049000 seconds
block_write1: 0.038000 seconds
block_write2: 0.032000 seconds

The ragged block was marginally slower (though perhaps not statistically significant), the fixed block was significantly faster as was the line-by-line write (but not enough to make it faster then any block write).

The test code (4K block version):

#include <stdio.h>
#include <string.h>
#include <time.h>


void line_by_line_write( int count )
{
  FILE* f = fopen("line_by_line_write.txt", "w");
  for( int i = 0; i < count; i++)
  {
      fprintf(f, "%d %d %d\n", 1234, 5678, 9012 ) ;
  }
  fclose(f);       
}

#define CHUNK_SIZE (4096)

void block_write1( int count )
{
  FILE* f = fopen("block_write1.txt", "w");
  char file_buffer[CHUNK_SIZE + 64];
  int buffer_count = 0;
  int i = 0;

  while( i < count )
  {
      buffer_count += sprintf( &file_buffer[buffer_count], "%d %d %d\n", 1234, 5678, 9012 );
      i++;

      // if the chunk is big enough, write it.
      if( buffer_count >= CHUNK_SIZE )
      {
          fwrite( file_buffer, buffer_count, 1, f );
          buffer_count = 0 ;
      }
  }

  // Write remainder
  if( buffer_count > 0 )
  {
      fwrite( file_buffer, 1, buffer_count, f );
  }
  fclose(f);       

}

void block_write2( int count )
{
  FILE* f = fopen("block_write2.txt", "w");
  char file_buffer[CHUNK_SIZE + 64];
  int buffer_count = 0;
  int i = 0;

  while( i < count )
  {
      buffer_count += sprintf( &file_buffer[buffer_count], "%d %d %d\n", 1234, 5678, 9012 );
      i++;

      // if the chunk is big enough, write it.
      if( buffer_count >= CHUNK_SIZE )
      {
          fwrite( file_buffer, CHUNK_SIZE, 1, f );
          buffer_count -= CHUNK_SIZE;
          memcpy( file_buffer, &file_buffer[CHUNK_SIZE], buffer_count );
      }
  }

  // Write remainder
  if( buffer_count > 0 )
  {
      fwrite( file_buffer, 1, buffer_count, f );
  }
  fclose(f);       

}

#define LINES 100000

int main( void )
{
    clock_t line_by_line_write_minimum = 9999 ;
    clock_t block_write1_minimum = 9999 ;
    clock_t block_write2_minimum = 9999 ;

    for( int i = 0; i < 100; i++ )
    {
        clock_t start = clock() ;
        line_by_line_write( LINES ) ;
        clock_t t = clock() - start ;
        if( t < line_by_line_write_minimum ) line_by_line_write_minimum = t ;

        start = clock() ;
        block_write1( LINES ) ;
        t = clock() - start ;
        if( t < block_write1_minimum ) block_write1_minimum = t ;

        start = clock() ;
        block_write2( LINES ) ;
        t = clock() - start ;
        if( t < block_write2_minimum ) block_write2_minimum = t ;
    }

    printf( "line_by_line: %f seconds\n", (float)(line_by_line_write_minimum) / CLOCKS_PER_SEC ) ;
    printf( "block_write1: %f seconds\n", (float)(block_write1_minimum) / CLOCKS_PER_SEC ) ;
    printf( "block_write2: %f seconds\n", (float)(block_write2_minimum) / CLOCKS_PER_SEC ) ;
}
1
Basile Starynkevitch On

It might be operating system and implementation specific.

Perhaps you could explicitly set the buffering using setvbuf(3) (I would recommend using at least a 32Kbyte buffer, and probably more).

Don't forget to explicitly ask your compiler to optimize, e.g. use gcc -Wall -O2

You could also code explicitly your integer to decimal representation routine (hint: writing a routine which, ginven some int x like 1234, fills a given small array of char with the digits in reverse order (e.g. "4321") is really simple and it will run fast).

4
abelenky On

Using any variation of printf, the function will have to scan the format string to find %d, and parse it for any extra options (such as %-03d), and work accordingly. That is a lot of processing time. printf is awesome because it is super-flexible, not because it is fast.

If you use an itoa type function to write each number, you're still going to turn your integer into a string, then copy that string to the file. You will spend all your processing time moving between string-buffers and file writes.

I think your fastest approach will be to make an in-memory, really big buffer, write everything to that, then do ONE AND ONLY ONE write to dump that entire buffer to the file.

Outline:

char buffer[10000];
for(i = 0; i < str->cnt; i++)
{
    /* write to buffer */
}

fwrite(buffer, buffer_size, 1, my_file);  // One fast write.