I am trying to convert a CFG to a CNF, but I am unsure what to identify as 'variables'. Here is the problem:
S -> aA | ABa
A -> AA | a
B -> AbA | bb
I have added a new start variable to make it
S' -> S
S -> aA | ABa
A -> AA | a
B -> AbA | bb
Then, after unit production removal, it is:
S' -> aA | ABa
S -> aA | ABa
A -> AA | a
B -> AbA | bb
I know the next step would be to change any production that have more than 2 variables, but is ABa three variables? Or is it two variables and a terminal?
If it is two variables and a terminal, to finally simplify it, am I able to create something like this:
S' -> aA | Xa
S -> aA | Xa
A -> AA | a
B -> Yb | bb
X -> AB
Y -> AA
Thanks!
The grammars in the Chomsky normal form has the following production formats:
It is made of symbols (
a,A...). These symbols are in two sets: terminal symbols (asaandb, lower case letters, both part of the alphabet) and non terminal symbols (asAandB, upper case letters). The grammar also has productions (likeS → a), and a starting symbol (a non-terminal)S.You start from this grammar (there is no need of new starting rule):
Separate the concatenations into their own productions:
Move all terminal symbols into their own non-terminal symbols (
aintoC(but keep theainA, because its already valid in the CNF) andbintoD):Split the the right hand sides where there are more than two non-terminal symbols:
Then you are done.