C fgets versus fgetc for reading line

13.7k views Asked by At

I need to read a line of text (terminated by a newline) without making assumptions about the length. So I now face to possibilities:

  • Use fgets and check each time if the last character is a newline and continuously append to a buffer
  • Read each character using fgetc and occasionally realloc the buffer

Intuition tells me the fgetc variant might be slower, but then again I don't see how fgets can do it without examining every character (also my intuition isn't always that good). The lines are quite large so the performance is important.

I would like to know the pros and cons of each approach. Thank you in advance.

5

There are 5 answers

3
Jonathan Leffler On BEST ANSWER

I suggest using fgets() coupled with dynamic memory allocation - or you can investigate the interface to getline() that is in the POSIX 2008 standard and available on more recent Linux machines. That does the memory allocation stuff for you. You need to keep tabs on the buffer length as well as its address - so you might even create yourself a structure to handle the information.

Although fgetc() also works, it is marginally fiddlier - but only marginally so. Underneath the covers, it uses the same mechanisms as fgets(). The internals may be able to exploit speedier operation - analogous to strchr() - that are not available when you call fgetc() directly.

1
Thom Smith On

If you can set a maximum line length, even a large one, then one fgets would do the trick. If not, multiple fgets calls will still be faster than multiple fgetc calls because the overhead of the latter will be greater.

A better answer, though, is that it's not worth worrying about the performance difference until and unless you have to. If fgetc is fast enough, what does it matter?

3
Mat On

Does your environment provide the getline(3) function? If so, I'd say go for that.

The big advantage I see is that it allocates the buffer itself (if you want), and will realloc() the buffer you pass in if it's too small. (So this means you need to pass in something gotten from malloc()).

This gets rid of some of the pain of fgets/fgetc, and you can hope that whoever wrote the C library that implements it took care of making it efficient.

Bonus: the man page on Linux has a nice example of how to use it in an efficient manner.

2
Jesse Cohen On

I would allocate a large buffer and then use fgets, checking, reallocing and repeating if you haven't read to the end of the line.

Each time you read (either via fgetc or fgets) you are making a system call which takes time, you want to minimize the number of times that happens, so calling fgets fewer times and iterating in memory is faster.

If you are reading from a file, mmap()ing in the file is another option.

0
Jerry Coffin On

If performance matters much to you, you generally want to call getc instead of fgetc. The standard tries to make it easier to implement getc as a macro to avoid function call overhead.

Past that, the main thing to deal with is probably your strategy in allocating the buffer. Most people use fixed increments (e.g., when/if we run out of space, allocate another 128 bytes). I'd advise instead using a constant factor, so if you run out of space allocate a buffer that's, say, 1 1/2 times the previous size.

Especially when getc is implemented as a macro, the difference between getc and fgets is usually quite minimal, so you're best off concentrating on other issues.