In C programming whenever I use fgetc(file) to read all the chars until the end of the file it works. But when I use the similar fscanf(file, "%c") function it prints strange characters. Code:

#include <stdio.h>
#include <stdlib.h>

int main() {
    char c;
    FILE * file = fopen("D\\filename.txt", "r");
    while (c != EOF) {
        fscanf(file, "%c", &c);
        printf("%c", c);
    }
    return 0;
}

But when I use fgetc instead of fscanf, it works. And it prints each character which is present in the file.

Can anybody answer why it works like this?

1

There are 1 answers

4
Basile Starynkevitch On BEST ANSWER

Notice that

c=fscanf(file,"%c");

is undefined behavior (here I am explaining why you should be afraid of it, even when a program seems to apparently "work"), and every good C compiler (e.g. GCC to be invoked as gcc -Wall -Wextra -g) should warn you about that (if you enable all warnings). When coding in C you should also learn how to use the debugger (e.g. gdb).

You should read documentation of fscanf(3). You probably want to code

char c= '\0';
if (fscanf(file, "%c", &c) <= 0) break;

You'll better take the habit of initializing every variable; a good optimizing compiler would remove that initialization if it is useless, and would often warn you about unitialized variables otherwise.

Notice that using fgetc(3) in your case is probably preferable. Then you need to declare c as an integer, not a character, and code:

do {
  int c=fgetc(file);
  if (c==EOF) break;
} while (!feof(file));

Notice that in the above loop the feof(file) would never be true (because fgetc would have given EOF before), so you'll better replace while(!feof(file)) with while(true)

It is simpler to read (by other developers, or even yourself in a couple of months) working on the same code, and it is very probably faster. Most implementations of fscanf are based somehow on fgetc or a very related thing.

Also, take the good habit of testing your input. The input file might not be as you expect.

On most recent systems, the encoding is today UTF-8. Be aware that some (human language) characters could be encoded in several bytes (e.g. French accentuated e letter é, or Russian yery letterЫ, or even the Euro sign , or the mathematical for all sign , letters or glyphs in other languages, etc....). You probably should consider using some UTF-8 library (e.g. libunistring) if you care about that (and you should care about UTF-8 in serious software!).

Nota Bene: If you are young and learning programming, better (IMNSHO) learn Scheme with SICP, using e.g. Racket before learning C or Java. C is really not for beginners IMHO.

PS the character type (often a byte) is char in lower cases.