Examples of Non Context free language in C language?

3.4k views Asked by At

What are examples of non - context free languages in C language ? How the following non-CFL exists in C language ?

a) L1 = {wcw|w is {a,b}*}

b) L2 = {a^n b^m c^n d^m| n,m >=1}

2

There are 2 answers

8
Alexey Frunze On

These things aren't context-free in C:

foo * bar; // foo multiplied by bar or declaration of bar pointing to foo?
foo(*bar); // foo called with *bar as param or declaration of bar pointing to foo?
foo bar[2] // is bar an array of foo or a pointer to foo?
foo (bar baz) // is foo a function or a pointer to a function?
2
rici On

The question is clumsily worded, so I'm reading between the lines, here. Still, it's a common homework/study question.

The various ambiguities [1] in the C grammar as normally presented do not render the language non-context-free. (Indeed, they don't even render the grammars non-context-free.) The general rule "if it looks like a declaration, it's a declaration regardless of other possible parses" can probably be codified in a very complex context-free grammar (although it's not 100% obvious that that is true, since CFGs are not closed under intersection or difference), but it's easier to parse with a simpler CFG and then disambiguate according to the declaration rule.

Now, the important point about C (and most programming languages) is that the syntax of the language is quite a bit more complex than the BNF used for explanatory purposes. For example, a C program is not well-formed if a variable is used without being defined. That's a syntax error, but it's not detected by the CFG parser. The grammatical productions needed to define these cases are quite complicated, due to the complicated syntax of the language, but they're going to boil down to requiring that ids appear twice in a valid program. Hence L1 = {wcw|w is {a,b}+} (here w is the identifier, and c is way too complicated to spell out). In practice, checking this requirement is normally done with a symbol table, and the formal language rules, while precise, are not written in a logical formalism. Since L1is not a context-free language, the formalism could not be context-free, but a context-sensitive grammar can recognize L1, so it's not totally impossible. (See, for example, Algol 68.)

The symbol table is also used to decide whether a particular identifier is to be reduced to typedef-name [2]. This is required to resolve a number of ambiguities in the grammar. (It also further restricts the set of strings in the language, because there are some cases where an identifier must be resolved as a typedef-name in order for the program to be valid.)

For another type of context-sensitivity, function calls need to match function declarations in the number of arguments; this sort of requirement is modelled by L2 = {a^n b^m c^n d^m| n,m >=1} where a and c represent the definition and use of some function, and b and d represent the definition and use of a different function. (Again, in a highly-simplified form.)

This second requirement is possibly less clearly a syntactic requirement. Other languages (Python, for example) allow function calls with any number of arguments, and detect a argument/parameter count match as a semantic error only detected at runtime. In the case of C, however, a mismatch is clearly a syntax error.

In short, the set of grammatically valid strings which constitute the C language is a proper subset of the set of strings recognized by the CFG presented in the C language definition; the set of valid parses is a proper subset of the set of derivations generated by the CFG, and the language itself is (a) unambiguous, and (b) not context-free.

Note 1: Most of these are not really ambiguities, because they depend upon how a given identifier is resolved (typedef name, function identifier, declared variable,...).

Note 2: It is not the case that identifier must be resolved as a typedef-name if it happens to be one; that only happens in places where the reduction is possible. It is not a syntax error to use the same identifier for both a type and a variable, even in the same scope. (It's not a good idea, but it's valid.) The following example, adapted from an example in section 6.7.8 of the standard, shows the use of t as both a field name and a typedef:

typedef signed int t;
struct tag {
    unsigned t:4;  // field named 't' of type unsigned int
    const t:5;     // unnamed field of type 't' (signed int)
};