Top Down Parsing - Java

3.4k views Asked by At

I have been assigned a project to implement a top-down backtracking parser for any grammar which contains only one nonterminal on the RHS of its rewrite rules (e.g S -> aaSb | aaSa | aSa)

So far, I have three methods, including main, that are used to handle checking the validity of an input string.

My goal is to, using a char[][] array for the grammar, check each character in the input string against the grammar, and return true if the string is contained within the grammar.

public class TDBP {
    public static void main(String[] args) {
        char[][] g = new char[][] 
            { {'a', 'a', 'S', 'b'}, 
              {'a', 'a', 'S', 'a'}, 
              {'a', 'S', 'a'}, 
              {'\0'} };

        SP(g);
    }
    public static void SP(char[][] g) {
        Scanner s = new Scanner(System.in);
        boolean again = true; int pn = 0;
        String test;

        while(again) {
            System.out.print("Next string? ");
            test = s.nextLine();

            if(S(pn, test, g))
                System.out.println("String is in the language");
            else
                System.out.println("String is not in the language");

            if(s.nextLine() == "\n") again = false;
        }

        s.close();
    }
    public static boolean S(int pn, String test, char[][] g) {
        char[] c = test.toCharArray();
        boolean exists = false;

        for(int i = pn; i < g.length; i++) {
            for(int j = 0; j < g[i].length; j++) {
                if(c[j] == 'S')
                    S(++pn, test, g);
                if(c[j] == g[i][j])
                    exists = true;
            }
        }

        return exists;
    }
}

In my algorithm, pn is an integer to keep track of which production in the grammar I am currently looking at, and to make sure that I don't scan the same grammar twice (e.g a pn of 1 in above grammar would correspond to aaSa). Also, I have \0 represent the empty string.

Am I parsing the string correctly?

Thank you!

2

There are 2 answers

0
APT On BEST ANSWER

I am bit rusty on my CS classes: but the following code worked for me:

public static boolean fullCheck(char[] toTest, char[][] g) {
    int val = checkOnAll(0, toTest, g);

    if (val == toTest.length) {
        return true;
    }
    return false;
}

public static int checkOnAll(int charPos, char[] toTest, char[][] g) {
    for(int i = 0; i < g.length; i++) {
        int val = checkOne(charPos, toTest, g[i], g);
        if (val != -1) {
            return val;
        }
    }
    return -1;
}

//also needs length checks
public static int checkOne(int charPos, char[] toTest, char[] rule, char[][] rules) {
    for(int j = 0; j < rule.length; j++) {
        if (charPos >= toTest.length) {
            return -1;
        }
        if(rule[j] == 'S') {
            int value = checkOnAll(charPos, toTest, rules);
            if (value == -1) {
                return -1;
            } else {
                charPos = value - 1;
            }
        } else if (toTest[charPos] != rule[j]) {
            return -1;
        }
        charPos++;
    }
    return charPos;
}

Instead of the "S" function, use the fullCheck one (give the input as a char array to the new method). I also changed the "g" array to be:

        char[][] g = new char[][]
            { {'a', 'a', 'S', 'b'},
              {'a', 'a', 'S', 'a'},
              {'a', 'S', 'a'},
              {} };

(the "\0" gave me trouble with the length checks, the above change was the simplest solution).

I found a few issues in your code, and although I am not totally sure that my own code is bug-free, I thought I'll share anyways: 1. when S returns a "false" inside your recursion, but the value is ignored. 2. the "pn" should be restricted to knowing which the test char we are on not the rule char. 3. even if the value returned is true, you must make sure you checked the entire test string and not just part of it. I did not see you do that. 4. if you have one very long rule but a short input, you might get an array out of bounds exception since you never look at the test string length.

I tried my own code with various inputs, and I have a feeling like I might have missed something, but I could not find it. if you find a problem, please let me know :)

good luck.

0
shintoZ On

This can be solved in a more easier way using the recursive descent top down parsing.

Suppose your grammer is

S -> aaSb | aaSa | aSa | # where # represents the empty string.

After the left factoring, it will be something like

S  -> aS' | #
S' -> aSS" | Sa
S" -> b | a

Then define each rule in separate method and call recursively as given below.

For the first rule: S -> aS' | #

function S(char[] input, int &i) {
     if(input[i]=='a') {
       i++;
       return S1(input, i);
      } 
      return true;   //For empty string
}

For the second rule: S' -> aSS" | Sa

function s1(char[] input, int &i) {
    if(input[i]=='a') {
       i++;
       S(input, i);
       return S2(input, i);
    } else {
       S(input, i);
       if(input[i]=='a') {
          return true;
       } else {
          return false;
       }
    }
}

Like this define the third function also.(Note that i must be pass by reference)

You can refer any of the top down parsing tutorials for more details. U can refer this one also.

Hope this will help.