Parse and calculate boolean expression in C

1.5k views Asked by At

I have a file with boolean expressions in the following format:

x0 x3+x4
x1+x2

these correspond to:

x0 AND x3 OR x4
x1 OR x2

Let's assume that we know the number of the expressions (N=2) and the number of x's (M=5). I want to have an array A of size N and an array Y of size M. The array A should contain in:

A[0] = y[0] && y[3] || y[4]
A[1] = y[1] || y[2]

and so on.

So far, I have implemented a python script that generates a .c file with the given array initialized with the expressions. But that means I need to recompile my program for each different instance. I want to avoid this. Any ideas are welcome.

Is there a "dirty" or quick trick that can do this? I would like to use purely C, if that's possible. Thanks in advance.

1

There are 1 answers

1
Ira Baxter On BEST ANSWER

Here's the essentials of a solution in C. It assumes one-letter variable names, and one-letter operators. (For fun, I included a "not" operator and parentheses for subexpressions.) It processes strings of the form:

   a&b|(~c&~a)

(where a..z correspond to OP's X and Y entities) and computes the boolean value of the expression.

This is implemented as a classic recursive descent parser; see Is there an alternative for flex/bison that is usable on 8-bit embedded systems? for more details on how to do this in general.

I coded it to point out just how straightforward this is for expressions. This is untested; OP gets to do his share of the work.

 #define syntax_error -1 // result of parser if expression is malformed

 bool variable_values[26]; // one for each letter a-z; initialized before parsing starts

 char* expression; // pointer to the expression string being processed

 int scan; // used to scan across the expression
 #define reject_character()  scan-- // used to back up scan when lexical error encountered
 #define skip_blanks() { while (expression[scan]==' ') scan++; }

 int boolean_primitive()
 // returns result of a subexpression consisting of just variable names or constants
 { int subexpression_result;
   skip_blanks();
   switch (expression[scan++])
   { case 'a','b', ... 'z': // variable name
       return variable_values[expression[scan-1]-"a"]; // look up value of variable
     case '0': // constant for "false"
       return 0;
     case '1': // constant for "true"
       return 1;
     default:
       return syntax_error;
   }
 }

 int boolean_term()
 // returns result of expression involving NOT or (...)
 { int subexpression_result;
   skip_blanks();
   switch (expression[scan++])
   { case '~': // not operator
       subexpression_result=boolean_primitive();
       if (subexpression_result==syntax_error)
          return syntax_error;
       return !subexpression_result;            
     case '(': // nested expression
       subexpression_result=boolean_or_sequence();
       if (subexpression_result==syntax_error)
          return syntax_error;
       skip_blanks();
       if (expression[scan++]==')')
          return subexpression_result;
       else return syntax_error;
     default:
       reject_character();
       return boolean_primitive();
   }
 }

 int boolean_and_sequence()
 // returns result of expression of form   s1 & s2 & ...
 { int subexpression_result=boolean_term();
   if (subexpression_result==syntax_error)
      return syntax_error;
   skip_blanks();
   while (expression[scan++]=='&') // "and" operator?
   { int subexpression2_result=boolean_term();
     if (subexpression2_result==syntax_error)
        return syntax_error;
     subexpression_result&=subexpression2_result;
     skip_blanks();
   }        
   reject_character; // undo overscan for '&'
   return subexpression_result;
}

 int boolean_or_sequence()
 // returns result of expression of form of s1 | s2 | ...
 { int subexpression_result=boolean_and_sequence();
   if (subexpression_result==syntax_error)
      return syntax_error;
   skip_blanks();
   while (expression[scan++]=='|') // "or" operator?
   { int subexpression2_result=boolean_primitive();
     if (subexpression2_result==syntax_error)
        return syntax_error;
     subexpression_result|=subexpression2_result;
     skip_blanks();
   }        
   reject_character; // undo overscan for '|'
   return subexpression_result;
}

 int calculate_boolean_expression(char* expression_to_evaluate)
 // returns int==0 for boolean false;
 // int==1 for boolean true
 // int==-1 for malformed expression
 {  int subexpression_result;
    scan=1;
    expression=expression_to_evaluate;
    subexpression_result=boolean_or_sequence();
    if (subexpression_result==syntax_error)
       return syntax_error;
    skip_blanks();
    if (expression[scan]==0) // expression ends without excess junk in string?
       return subexpression_result;
    else return syntax_error;
 }

After initializing the variable values, you invoke this as:

 int the_answer=calculate_boolean_expression(&some_expression_string);

Obviously you want to check the answer to see if the parser found a syntax error.

You can avoid the global variables by passing them all as arguments, if you insist.

Returning/checking for syntax_error is awkward in C and clutters the implementation because of the repeated need to check for it as a result from an invoked sub-parser. Implementing this a throwable exception would be nicer but C doesn't allow that. You can fake it with a longjmp.

You'll have to expand it to handle your (multi-character) lexical complications, whatever they are. For quick and dirty, sticking to single-character operators/operands is pretty clean. The secret for handling multi-character lexemes is to test for each lexeme where you expect to encounter it, and simply backup the "scan" pointer to the beginning of a failed lexeme. I used the macro "reject" for this.