Removing nullable productions

116 views Asked by At

I am working through creating a lexical analyzer for a subset of pascal, which is a programming project recommended in the back of the compiler red dragon textbook. I am working through changing the grammar into LL so that I can create the parse table. I am ignoring the dangling else ambiguity, and believe that was the only ambiguity present. I have just removed all of the epsilon productions. Before I move on to left factoring, I wanted some feedback on whether I removed the epsilon productions correctly. I have been staring at this for quite a while so really not sure whether I have missed something that will cause problems later and have no one to check it against. I know this is a long post, I don't really know where else to go to have this checked by someone who knows. I really appreciate your time!

Original Grammar:

<statement> →
    if <expression> then <statement>
<factor> → id
<program> →
     <program> id (<identifier_list>);
     <declarations>
     <subprogram_declarations>
     <compound_statement>
     .
<identifier_list> →
     id
     | <identifier_list>, id
<declarations> →
     <declarations> var id : <type>;
     |E
<type> → 
     <standard_type>
     |array[num..num] of <standard_type>
<standard type> →
     integer
     | real
<subprogram_declarations> →
     <subprogram_declarations> <subprogram_declaration>;
     |E
<subprogram_declaration> →
     <subprogram_head declarations> 
     <subprogram_declarations> 
     <compound_statement>
<subprogram_head> → 
     function id <arguments> : <standard_type>;
<arguments> → 
    (<parameter_list>)
    | E
<parameter_list> →
     id : <type> 
     | <parameter_list>; id : <type>
<compound statement> →
     begin
     <optional_statements>
     end
<optional_statements> →
     <statement_list>
     | E
<statement_list> →
     <statement>
     |<statement_list>; <statement>
<statement> →
     <variable> assignop <expression>
     | <procedure_statement>
     | <compound_statement>
     | if <expression then statement> else <statement>
     | while <expression> do <statement>
     | if <expression> then <statement>
<variable> → 
     id
     | id [<expression> ]
<expression_list> →
     <expression>
     |<expression_list> , <expression>
<expression> →
     <simple_expression>
     | <simple_expression> relop <simple_expression>
<simple_expression> →
     <term>
     | sign <term>
     | <simple_expression> addop <term>
<term> →
     <factor>
     | <term> mulop <factor>
<factor> →
     id
     | id (<expression_list>)
     | num
     | (<expression>)
     | not <factor>
     | id [<expression>]
<sign> →
     + | -

Grammar post epsilon production elimination

<statement> →
    if <expression> then <statement>
<factor> → id
<program> →
     <program><id (<identifier_list>);
     <declarations>
     <subprogram_declarations>
     .
     |<program> id (<identifier_list>);
      <subprogram_declarations>
      .
     |<program>id (<identifier_list>);
      <declarations>
      .
     |<program> id (<identifier_list>);
      .
     |<program><id (<identifier_list>);
      <declarations>
      <subprogram_declarations>
      begin
      <statement_list>
      end
      .
     |<program> id (<identifier_list>);
      <subprogram_declarations>
      begin
      <statement_list>
      end
      .
     |<program>id (<identifier_list>);
      <declarations>
      begin
      <statement_list>
      end
      .
     |<program> id (<identifier_list>);
      begin
      <statement_list>
      end
      .
<identifier_list> →
     id
     | <identifier_list>, id
<declarations> →
     <declarations> var id : <type>;
<type> → 
     <standard_type>
     |array[num..num] of standard_type
<standard type> →
     <integer>
     | <real>
     <subprogram_declarations> →
     <subprogram_declarations>  <subprogram_declaration>;
<subprogram_declaration> →
     <subprogram_head> <declarations> 
     <subprogram_declarations >
     begin
     <statement_list>
     end
     |<subprogram_head>
      <subprogram_declarations >
      begin
      <statement_list>
      end
     |<subprogram_head> <declarations> 
      begin
      <statement_list>
      end
     |<subprogram_head>
      begin
      <statement_list>
      end
     |<subprogram_head> <declarations> 
      <subprogram_declarations >
     |<subprogram_head>
      <subprogram_declarations >
     |<subprogram_head> <declarations> 
     |<subprogram_head>
<subprogram_head> → 
      function id (<parameter_list>) : <standard_type>;
      |function id : <standard_type>;
<parameter_list> →
     id : <type> 
     |<parameter_list>; id : <type>
<statement_list> →
     <statement>
     |<statement_list>;< statement>
<statement> →
     <variable> assignop <expression>
     | <procedure_statement>
     | <compound_statement>
     | if <expression> then <statement> else <statement>
     | while <expression> do <statement>
     | if <expression> then <statement>
<variable> → 
     id
     |id [<expression>]
<expression_list> →
     <expression>
     |<expression_list> , <expression>
<expression> →
     <simple_expression>
     | <simple_expression> relop <simple_expression>
<simple_expression> →
     <term>
     | <sign> <term>
     | <simple_expression> addop <term>
<term> →
     <factor>
     | <term> mulop <factor>
<factor> →
     id
     | id (<expression_list>)
     | num
     | (<expression>)
     | not <factor>
     | id [<expression>]
<sign> →
     + | -

The second grammar is significantly different, of course. I removed all epsilon productions by finding the references to it and creating an | option with and without it. When removing the epsilon left only one line in a production, I deleted that production altogether and moved the line to wherever the production was called previously. For example, I eliminated completely this way, by changing from:

 <subprogram_head> → 
      function id <arguments> : <standard_type>;
 <arguments> → 
      (<parameter_list>)
      | E

to

<subprogram_head> → 
     function id (<parameter_list>) : <standard_type>;
     |function id : <standard_type>;
0

There are 0 answers