How to properly recognize different line endings in C?

1.4k views Asked by At

I guess the title speaks for itself.

I am coding a C program on Windows 7, using g++ and Notepad++, which compares content of files.

Content of the file:

simple
file with lines

File has line endings in windows style CRLF.

When I count the length of file using this code:

fseek(file, 0, SEEK_END);
size = ftell(file);
fseek(file, 0, SEEK_SET);

I get 23.

When I change line endings to Unix format LF (using Notepad++) I get 22 length.

This creates kind of a problem, when comparing two files. That's why I ask, if there is a way to determine if given file has LF or CR or CRLF.

I know that I can distinguish between CR and LF, LF has ascii code 10 and CR has ascii code 13. Or LF is '\n' and CR is '\r'.

But when reading file char after char I always get LF (ascii 10), even if there is CRLF.

I hope I made it clear. Thanks.

2

There are 2 answers

0
mmmmmm On BEST ANSWER

That is the difference between reading files in text and binary mode.

In text mode (fopen with the relevant parameters fopen( file, "r") then getc etc) all line ends are read as one character. If you read in binary mode e.g. fopen(file, "rb") then you will get the actual bytes and you will see CRLF and CR as different. fseek will use the actual number of bytes and so sees the difference in line endings.

And the only way to tell is to read the files in the two different ways and see if there are CRLF pairs or the size differs, or in practice just see if there is a LF as I fdon't think any current major OS uses that as a line enfing.

1
j_random_hacker On

In addition to Mark's answer, if you need to do this for a filehandle that has already been opened (such as stdin or stdout), you can use _setmode():

#include <fcntl.h>
#include <io.h>

...

_setmode(fileno(stdin), _O_BINARY);

This works provided no input or output has already occurred to that filehandle. Incidentally, _setmode() only exists on Windows and DOS; on Unix-like operating systems (including versions of Mac OS since OS X), files are effectively always opened in binary mode, and fopen(file, "...b") there is accepted but has no effect. On these platforms, a line ending is encoded by the single character \n.