3

I'd like to create a regular expression in Python that will match against a line in Python source code and return a list of function calls.

The typical line would look like this:

something = a.b.method(time.time(), var=1) + q.y(x.m())

and the result should be:

["a.b.method()", "time.time()", "q.y()", "x.m()"]

I have two problems here:

  1. creating the correct pattern
  2. the catch groups are overlapping

thank you for help

3
  • And what about parsing strings and comments? Commented Dec 28, 2011 at 16:35
  • 2
    python isn't a regular language, so you can't do that with regex. Commented Dec 28, 2011 at 16:43
  • @DouglasLeeder, regex are not regular. Unless we are discussing formal language theory here. ;-) Commented Dec 28, 2011 at 16:46

5 Answers 5

13

I don't think regular expressions is the best approach here. Consider the ast module instead, for example:

class ParseCall(ast.NodeVisitor):
    def __init__(self):
        self.ls = []
    def visit_Attribute(self, node):
        ast.NodeVisitor.generic_visit(self, node)
        self.ls.append(node.attr)
    def visit_Name(self, node):
        self.ls.append(node.id)


class FindFuncs(ast.NodeVisitor):
    def visit_Call(self, node):
        p = ParseCall()
        p.visit(node.func)
        print ".".join(p.ls)
        ast.NodeVisitor.generic_visit(self, node)


code = 'something = a.b.method(foo() + xtime.time(), var=1) + q.y(x.m())'
tree = ast.parse(code)
FindFuncs().visit(tree)

result

a.b.method
foo
xtime.time
q.y
x.m
Sign up to request clarification or add additional context in comments.

3 Comments

+1 nice tutorial on the ast module! Nice to know that it provides something a bit more useful than just literal_eval :)
In fact, unless I'm mistaken a regex based approach is doomed to fail. The Python language is based upon a context-free grammar, and (again unless I'm mistaken) a CFG is more expressive than a Regular Expression (thank you Chomsky Hierarchy
@AdamParkin: some of the answers to this question might be interesting for you.
4
$ python3
>>> import re
>>> from itertools import chain
>>> def fun(s, r):
...     t = re.sub(r'\([^()]+\)', '()', s)
...     m = re.findall(r'[\w.]+\(\)', t)
...     t = re.sub(r'[\w.]+\(\)', '', t)
...     if m==r:
...         return
...     for i in chain(m, fun(t, m)):
...         yield i
...
>>> list(fun('something = a.b.method(time.time(), var=1) + q.y(x.m())', []))
['time.time()', 'x.m()', 'a.b.method()', 'q.y()']

Comments

2
/([.a-zA-Z]+)\(/g

should match the method names; you'd have to add the parens after since you have some nested.

4 Comments

foo("bar(a,b)") would return bar incorrectly for that regex.
@DouglasLeeder It looks good but this Python code doesn't print what is expected.
@xralf looks like python doesn't use the bounding slashes, and also uses different functions for global search: pastebin.com/QbD2awfJ should do what you want.
@DouglasLeeder Thank you. This works good now, but the thg435's solution seems to cover more special cases.
1

I don't really know Python, but I can imagine that making this work properly involves some complications, eg:

  • strings
  • comments
  • expressions that return an object

But for your example, an expression like this works:

(?:\w+\.)+\w+\(

Comments

0


I have an example for you proving this is doable in Python3

    import re


    def parse_func_with_params(inp):
        func_params_limiter = ","
        func_current_param = func_params_adder = "\s*([a-z-A-Z]+)\s*"

        try:
            func_name = "([a-z-A-Z]+)\s*"
            p = re.compile(func_name + "\(" + func_current_param + "\)")
            print(p.match(inp).groups())
        except:
            while 1:
                func_current_param += func_params_limiter + func_params_adder
                try:
                    func_name = "([a-z-A-Z]+)\s*"
                    p = re.compile(func_name + "\(" + func_current_param + "\)")
                    print(p.match(inp).groups())
                    break
                except:
                    pass

Command line Input: animalFunc(lion, tiger, giraffe, singe)
Output: ('animalFunc', 'lion', 'tiger', 'giraffe', 'singe')

As you see the function name is always the first in the list and the rest are the paramaters names passed


Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.