1

Im trying to parse cpp using python. I generated the parser with ANTLR for python and now I want to visit the tree and gather some information.

  • Is there anyway to dump the ANTLR tree as AST in JSON format?
  • I was trying to trace the function calls I was expecting something like CallExpr but I couldn't find anything in generated parser files.

This is the grammar file im using https://github.com/antlr/grammars-v4/blob/master/cpp/CPP14.g4

I tried the following command to get the CPP parser, java -jar antlr-4.8-complete.jar -Dlanguage=Python3 ./CPP14.g4 -visitor

and this is the very basic code i have

import sys
import os
from antlr4 import *
from CPP14Lexer import *
from CPP14Parser import *
from CPP14Visitor import *



class TREEVisitor(CPP14Visitor):
    def __init__(self):
        pass


    def visitExpressionstatement(self, ctx):
        print(ctx.getText())
        return self.visitChildren(ctx)



if __name__ == '__main__':
    dtype = ""
    input_stream = FileStream(sys.argv[1])
    cpplex = CPP14Lexer(input_stream)
    commtokstream = CommonTokenStream(cpplex)
    cpparser = CPP14Parser(commtokstream)
    print("parse errors: {}".format(cpparser._syntaxErrors))

    tree = cpparser.translationunit()

    tv = TREEVisitor()
    tv.visit(tree)

and the input file im trying to parse,

#include <iostream>

using namespace std;


int foo(int i, int i2)
{
    return i * i2;
}

int main(int argc, char *argv[])
{
    cout << "test" << endl;
    foo(1, 3);
    return 0;
}

Thanks

5
  • Can you add a link to the grammar you're using? Can you also update your question and post the code you're using to parse the CPP source file? (as well as the contents of this source file) Commented Apr 16, 2020 at 19:45
  • In the grammar, I don't see a rule called CallExpr: why are you expecting it? What is the input you're parsing? Commented Apr 16, 2020 at 20:10
  • The question is how can I visit the function calls like foo(1, 2, 3); I thought maybe that is defined under another name in that grammar file! Commented Apr 16, 2020 at 20:20
  • remove def __init__(self): pass Commented Apr 16, 2020 at 20:35
  • @eyllanesc that might be good advice, but it has nothing to do with the problem at hand. And when giving advice, it would be nice to explain why, or provide a link that explains this. Commented Apr 16, 2020 at 20:42

1 Answer 1

3

Function calls are recognised by the postfixexpression rule:

postfixexpression
   : primaryexpression
   | postfixexpression '[' expression ']'
   | postfixexpression '[' bracedinitlist ']'
   | postfixexpression '(' expressionlist? ')'   // <---- this alternative!
   | simpletypespecifier '(' expressionlist? ')'
   | typenamespecifier '(' expressionlist? ')'
   | simpletypespecifier bracedinitlist
   | typenamespecifier bracedinitlist
   | postfixexpression '.' Template? idexpression
   | postfixexpression '->' Template? idexpression
   | postfixexpression '.' pseudodestructorname
   | postfixexpression '->' pseudodestructorname
   | postfixexpression '++'
   | postfixexpression '--'
   | Dynamic_cast '<' thetypeid '>' '(' expression ')'
   | Static_cast '<' thetypeid '>' '(' expression ')'
   | Reinterpret_cast '<' thetypeid '>' '(' expression ')'
   | Const_cast '<' thetypeid '>' '(' expression ')'
   | typeidofthetypeid '(' expression ')'
   | typeidofthetypeid '(' thetypeid ')'
   ;

So if you add this to your visitor:

def visitPostfixexpression(self, ctx:CPP14Parser.PostfixexpressionContext):
    print(ctx.getText())
    return self.visitChildren(ctx)

It will get printed. Note that it will now print a lot more than function calls, since it matches much more than that. You could label the alternatives:

postfixexpression
   : primaryexpression                                     #otherPostfixexpression
   | postfixexpression '[' expression ']'                  #otherPostfixexpression
   | postfixexpression '[' bracedinitlist ']'              #otherPostfixexpression
   | postfixexpression '(' expressionlist? ')'             #functionCallPostfixexpression
   | simpletypespecifier '(' expressionlist? ')'           #otherPostfixexpression
   | typenamespecifier '(' expressionlist? ')'             #otherPostfixexpression
   | simpletypespecifier bracedinitlist                    #otherPostfixexpression
   | typenamespecifier bracedinitlist                      #otherPostfixexpression
   | postfixexpression '.' Template? idexpression          #otherPostfixexpression
   | postfixexpression '->' Template? idexpression         #otherPostfixexpression
   | postfixexpression '.' pseudodestructorname            #otherPostfixexpression
   | postfixexpression '->' pseudodestructorname           #otherPostfixexpression
   | postfixexpression '++'                                #otherPostfixexpression
   | postfixexpression '--'                                #otherPostfixexpression
   | Dynamic_cast '<' thetypeid '>' '(' expression ')'     #otherPostfixexpression
   | Static_cast '<' thetypeid '>' '(' expression ')'      #otherPostfixexpression
   | Reinterpret_cast '<' thetypeid '>' '(' expression ')' #otherPostfixexpression
   | Const_cast '<' thetypeid '>' '(' expression ')'       #otherPostfixexpression
   | typeidofthetypeid '(' expression ')'                  #otherPostfixexpression
   | typeidofthetypeid '(' thetypeid ')'                   #otherPostfixexpression
   ;

and you can then do:

def visitFunctionCallPostfixexpression(self, ctx:CPP14Parser.FunctionCallPostfixexpressionContext):
    print(ctx.getText())
    return self.visitChildren(ctx)

and then only foo(1,3) gets printed (note that you might want to label more rules as functionCallPostfixexpression inside the postfixexpression rule).

Is there anyway to dump the ANTLR tree as AST in JSON format?

No.

But you could easily create something yourself of course: the objects returned by each parser rule, like translationunit, contains the entire tree. A quick and dirty example:

import antlr4
from antlr4.tree.Tree import TerminalNodeImpl
import json

# import CPP14Lexer, CPP14Parser, ...


def to_dict(root):
    obj = {}
    _fill(obj, root)
    return obj


def _fill(obj, node):

    if isinstance(node, TerminalNodeImpl):
        obj["type"] = node.symbol.type
        obj["text"] = node.getText()
        return

    class_name = type(node).__name__.replace('Context', '')
    rule_name = '{}{}'.format(class_name[0].lower(), class_name[1:])
    arr = []
    obj[rule_name] = arr

    for child_node in node.children:
        child_obj = {}
        arr.append(child_obj)
        _fill(child_obj, child_node)


if __name__ == '__main__':
    source = """
        #include <iostream>

        using namespace std;

        int foo(int i, int i2)
        {
            return i * i2;
        }

        int main(int argc, char *argv[])
        {
            cout << "test" << endl;
            foo(1, 3);
            return 0;
        }
        """
    lexer = CPP14Lexer(antlr4.InputStream(source))
    parser = CPP14Parser(antlr4.CommonTokenStream(lexer))
    tree = parser.translationunit()
    tree_dict = to_dict(tree)
    json_str = json.dumps(tree_dict, indent=2)
    print(json_str)
Sign up to request clarification or add additional context in comments.

12 Comments

I found this stackoverflow.com/questions/49116223/… , I wonder if that can be used for CPP?
Sure, you,ll just have ro port the Java code to Python
@Alex I added a quick example snippet of how you could go about it
Is there any doc for explaining all those terms in the output tree? When the function foo is called in seems its under "unaryexpression": [...] and then its argument again under "unaryexpression", is that normal? Im not sure how i must interpret the output! I mean that is only a function call but it is under "expressionstatement", "assignmentexpression", "conditionalexpression", "logicalorexpression" and many others... That looks weird to me!
Yes, that's normal. The parser starts with translationunit, if you follow the path it makes to eventually come to your function call.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.