Can't parse COBOL source code with Antlr4

2k views Asked by At

I'm learning on how to use Antlr4 to parse COBOL source codes. Currently, I'm following the steps, exactly as demonstrated by Enam Biswas in his Youtube video.

Basically, I've downloaded antlr-4.7.1-complete.jar and placed it in C:\Javalib. Yes, I've also include the path into my Windows environment and created the antlr.bat and grun.bat files.

For the grammar files, I'm using Cobol85.g4 and Cobol85Preprocessor.g4 which were taken from Ulrich Wolffgang github. On the same time, I use HellowWorl.cbl sample source code to see how the parsing works.

After running the antlr.bat, I executed the command below:

C:\Users\ffa\Desktop\COBOL>grun Cobol85Preprocessor startRule HellowWorld.cbl

As the result, I received the error message as shown below:

Warning: TestRig moved to org.antlr.v4.gui.TestRig; calling automatically
Can't load Cobol85.g4 as lexer or parser

As I'm not sure why I can't get it parsed as shown in the video, I also attempted below commands:

C:\Users\ffa\Desktop\COBOL>grun Cobol85 startRule HellowWorld.cbl

and

C:\Users\ffa\Desktop\COBOL>grun Cobol85* startRule HellowWorld.cbl

End up, I still get the same error message. So, I did my search through Google and found a suggestion to download antlr-runtime-4.7.1.jar. So, I downloaded the file and placed it in the same directory which is located at C:\Javalib.

When I executed the commands above, this time, I received a different message

Error: Could not find or load main class org.antlr.v4.runtime.misc.TestRig

Could anyone please assist me to parse the COBOL source code with Antlr4? It would also be good if someone could explain the difference between Cobol85.g4 and Cobol85Preprocessor.g4.

2

There are 2 answers

0
u.wol On BEST ANSWER

Disclaimer: I am the author of these COBOL ANTLR4 grammar files.

The parser generated from grammar Cobol85.g4 has to be provided with COBOL source code, which has been preprocessed with a COBOL preprocessor. Cobol85Preprocessor.g4 is at the core of this preprocessor and enables parsing of statements such as COPY REPLACE, EXEC SQL etc.

Cobol85Preprocessor.g4 is meant to be augmented with quite extensive additional logic, which is not included in the grammar files and enables normalization of line formats, line breaks, comment lines, comment entries, EXEC SQL, EXEC CICS and so on.

The ProLeap COBOL parser written by me implements all of this in Java based on the files Cobol.g4 and Cobol85Preprocessor.g4.

0
Bart Kiers On

From your console, go into a new directory and do the following:

1. Download the ANTLR jar:

wget http://www.antlr.org/download/antlr-4.7.1-complete.jar

(or just download it if wget is not available on your console)

2. Download the COBOL grammar:

wget https://raw.githubusercontent.com/antlr/grammars-v4/master/cobol85/Cobol85.g4

3. Download a COBOL source file:

wget https://raw.githubusercontent.com/uwol/cobol85parser/master/src/test/resources/io/proleap/cobol/ast/HelloWorld.cbl

4. Generate all .java lexer and parser classes from the COBOL grammar:

java -jar antlr-4.7.1-complete.jar Cobol85.g4

5. Comile all .java source files:

javac -cp antlr-4.7.1-complete.jar *.java

6. Feed the COBOL source file to the generated lexer/parser

... and instruct the parser to start with the startRule rule:

java -cp .;antlr-4.7.1-complete.jar org.antlr.v4.gui.TestRig Cobol85 startRule -gui < HelloWorld.cbl

(*nix users, do java -cp .:antlr-4.7.1-complete.jar org.antlr.v4.gui.TestRig Cobol85 startRule -gui < HelloWorld.cbl)

If the < does not work on Windows, just do this:

java -cp .;antlr-4.7.1-complete.jar org.antlr.v4.gui.TestRig Cobol85 startRule -gui

The prompt will now be silent. It is writing for you to type in some source to be parsed. When you're done typing in some COBOL code, terminate with CTRL+Z (*nix users do CTRL+D).

That's it.

Now there are some errors printed to your console, meaning the COBOL parser cannot properly parse the source file. Whether that has something to do with first doing something with the pre-processor, or the input that is invalid, I don't know.