If we have two .c files and a .h file: main.c sub.c sub.h, where
main.c
#include "sub.h"
...
sub.c
#include "sub.h"
...
we can compile the program with, either i)
gcc -o a.out main.c sub.c
or ii)
gcc -c main.c
gcc -c sub.c
gcc -o a.out main.o sub.o
Given this case, does preprocessor output one or two translation unit(s)?
I am confused because: main.c includes sub.h, meaning preprocessor would output one compilation unit. On the other hand, there are two object files created, main.o and sub.o, before creating executable, making me to think that "two source files thus two translation units."
Which part am I misunderstanding? or where am I making mistakes?
Here's what the C standard has to say about that:
(Source: C99 draft standard, 5.1.1.1 ยง1)
So in both of your cases you have two translation units. One of them comes from the compiler preprocessing
main.cand everything that is included through#includedirectives—that is,sub.hand probably<stdio.h>and other headers. The second comes from the compiler doing the same thing withsub.c.The difference from your first to your second example is that in the latter you are explicitly storing the "different translated translation units" as object files.
Notice that there is no rule associating one object file with any number of translation units. The GNU linker is one example of linker that is capable of joining two
.ofiles together.The standard, as far as I know, does not specify the extension of source files. Notwithstanding, in practical aspects you are free to
#includea.cfile into other, or placing your entire program in a.hfile. Withgccyou can use the option-x cto force a.hfile to be treated as the starting point of a translation unit.The distinction made here:
is because a header need not be a source file. Similarly, the contents of
<...>in an#includedirective need not be a valid file name. How exactly the compiler uses the named headers<...>and"..."is implementation-defined.