I'm using the following Python code to print the tokens in a PLSQL source file.
from antlr4 import *
from antlr4.tree.Tree import TerminalNodeImpl
from antlr4.tree.Trees import Trees
from PlSqlLexer import PlSqlLexer
from PlSqlParser import PlSqlParser
import sys
import json
def main():
with open(sys.argv[1], 'r') as file:
filesrc = file.read()
lexer = PlSqlLexer(InputStream(filesrc))
parser = PlSqlParser(CommonTokenStream(lexer))
tree = parser.sql_script()
traverse(tree, parser.ruleNames)
def traverse(tree, rule_names, indent = 0):
tree
if tree.getText() == "<EOF>":
return
elif isinstance(tree, TerminalNodeImpl):
print("{0}TOKEN='{1}'".format(" " * indent, tree.getText() )) ## <<< Prints Token
#print (tree)
n = 1
else:
print("{0}{1}".format(" " * indent, rule_names[tree.getRuleIndex()]))
for child in tree.children:
traverse(child, rule_names, indent + 1)
if __name__ == '__main__':
main()
When run with a PLSQL source file it will give out like this:
TOKEN='CREATE'
TOKEN='OR'
TOKEN='REPLACE'
TOKEN='PACKAGE'
TOKEN='BODY'
TOKEN='pa_temp'
TOKEN='AS'
TOKEN='PROCEDURE'
TOKEN='pr_new_item'
TOKEN='('
TOKEN='p_item'
TOKEN='IN'
TOKEN='items'
.
.
.
But I would to print also what the token type is (procedure start, variable, table, etc).
I have tried to do print ( json.dumps(tree) ) and print( json.dumps(parser) to see if there anything useful but this just errors like:
TypeError: Object of type PlSqlLexer is not JSON serializable
I have a small utility method that converts the parse tree ANTLR produces into a
dict, which can be converted into json. Let's say your grammar look like:then the input
2 * (555 - -50) / 42will be parsed as:You can use the following Python code to convert the parse tree to a
dict:which will print: