Purpose of work: to learn how to work with files using the functions of the standard C library
Task: There is a file with a C program. It is necessary to delete all comments from it and write the code without comments to a new file.
Explanations and implementation peculiarities:
- The program file may be very large. Therefore, the whole file must NOT be read into an array beforehand.
- Comments can be single-line or multi-line. A single-line comment can also consist of several lines, if it is moved to the next line using the "backslash" character - "\".
- There are no nested comments in the C language
- Comments inside string constants are not taken into account
- A file does not necessarily represent a correct C program. For example, a comment may break without being closed
- It is allowed both the appearance of a few new spaces and/or line feeds in place of a deleted comment and the absence of some existing non-significant delimiter characters.
- It is not allowed to delete data from constant strings marked with quotation marks (double and single).
Input and output data: The source file is always named test.c The output file must be named test.wc
My code:
#include <stdio.h>
#define TRUE 1
#define FALSE 0
typedef int BOOL;
int mygetc (FILE *in) {
for (;;) {
int c = getc(in);
if (c == '\\') {
c = getc(in);
if (c == '\n')
continue;
if (c != EOF)
ungetc(c, in);
c = '\\';
}
return c;
}
}
int skip_line_comment (FILE *in) {
int c;
while ((c = mygetc(in)) != '\n' && c != EOF)
continue;
return c;
}
int skip_block_comment (FILE *in) {
int c;
for (;;) {
while ((c = mygetc(in)) != '*') {
if (c == EOF)
return c;
}
while ((c = mygetc(in)) == '*')
continue;
if (c == EOF)
return c;
if (c == '/')
return ' ';
}
}
void removeComments (FILE *in, FILE *out) {
int c;
while ((c = mygetc(in)) != EOF) {
if (c == '"' || c == '\'') {
int separator = c;
fputc(c, out);
while ((c = mygetc(in)) != separator && c != EOF) {
fputc(c, out);
if (c == '\\') {
c = mygetc(in);
if (c == EOF) break;
fputc(c, out);
}
}
} else if (c == '/') {
c = mygetc(in);
if (c == '/') c = skip_line_comment(in);
else if (c == '*') c = skip_block_comment(in);
else fputc('/', out);
}
if (c == EOF) break;
fputc(c, out);
}
}
int main () {
const char inName[20] = "test.c";
const char outName[20] = "test.wc";
FILE *in;
FILE *out;
in = fopen(inName, "r");
out = fopen(outName, "w");
removeComments(in, out);
fclose(in);
fclose(out);
return 0;
}
.zip with tests: Google Disk
- Test 1 - correct
- Test 2 - correct
- Test 3 - missing "/" in the first line
- Test 4 - expected "* * *" in one line but got * in different lines
- Test 5 - correct
- Test 6 - problems with "", got some extra lines like """"""""\"""\" which had to be removed
- Test 7 - correct
- Test 8 - stripped out the extra lines
- Test 9, 10 - problem with combination / and \
- Test 11 - correct
- Test 12 - correct
- Test 13 - correct
Decided to completely rewrite the program. Now there is a separate algorithm for each case. It passes all 13 tests.