0

I want to pull out all of the python functions within a python script. Is there any single regex I can use to do this, e.g:

import re
all_functions = re.findall(regex, python_script)

I have implemented a very cumbersome way of doing this involving many if statements, but I feel there is a more elegant solution with regexes.

I think the regex should be something like this:

'def.*?\n\S'

because:

  1. Functions start with def
  2. Followed by anything (but we want to be non-greedy)
  3. A function ends when after a newline character \n, the starting character of the next line is not white space \S

However, I can't seem to get this to work over multiple lines.

Edit: Python functions may be contained in files that don't have .py extensions; e.g. they can be contained in IPython notebooks with .ipynb extension so I can't necessarily always import the code and use dir().

5
  • 4
    Why not import it as module and look into its contents via dir and check their types? If the code is to be trusted of course Commented Oct 10, 2015 at 20:49
  • mpcabd, thanks for the suggestion, but I was looking for a solution that would work for all file types, e.g. sometimes there are python functions within IPython notebooks which have .ipynb extensions, so a regex would allow me to get functions out of those file types as well Commented Oct 10, 2015 at 20:55
  • def can't be followed by "anything", there's a definition for identifiers. Perhaps you should look into Python's grammar? Also, if you want . to include line breaks you need re.DOTALL. Commented Oct 10, 2015 at 20:55
  • jonrsharpe, you are right, the regex should be more specific, perhaps something along the lines of 'def [A-Za-z_][A-Za-z_0-9]*?\(.*?\)', assuming only well written python functions are contained in the file, I think that should detect the beginning of functions. Commented Oct 10, 2015 at 21:02
  • "1. Functions start with def". This is almost entirely untrue. They can be assigned to variables, synthesized from code objects, built anonymously with lambda statements, assigned to keys in the globals() dictionary, or appear as if by magic wherever eval rears its ugly head... and I'm sure I've missed some. In general, you can't use regular expressions to parse a non-regular language like Python. Can't be done, except in ridiculously limited circumstances... and even then, only when you already know exactly what you're looking for. Commented Oct 10, 2015 at 21:02

2 Answers 2

7

Don't use a regular expression. Have Python parse the code for you and find all the function definitions with the ast module:

import ast

with open(python_sourcefile) as sourcefile:
    tree = ast.parse(sourcefile.read(), sourcefile.name)

for node in ast.walk(tree):
    if isinstance(node, ast.FunctionDef):
        print(node.name)

If the code is contained in .ipynb files, parse the file and extract the code cells, then put the input source code from those through the same process.

Demo with the ast module source itself:

>>> import ast
>>> with open(ast.__file__.rstrip('c')) as sourcefile:
...     tree = ast.parse(sourcefile.read(), sourcefile.name)
... 
>>> for node in ast.walk(tree):
...     if isinstance(node, ast.FunctionDef):
...         print(node.name)
... 
parse
literal_eval
dump
copy_location
fix_missing_locations
increment_lineno
iter_fields
iter_child_nodes
get_docstring
walk
_convert
_format
_fix
visit
generic_visit
generic_visit
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, the ast library looks pretty cool, I will try this out
1

This regex might work out for you:

re.compile('def (?P<function>(?P<function_name>.*?)\((?P<function_args>.*)\)):')

I used groups so you could get information out easily using the groupdict() method of a match object, but if you just want the declaration line, you can use re.compile('def .*?\(.*)\):')

This regex could be tighter (it'll accept def do something(1,2,3): even though it's not a valid function), but if your python files are syntactically correct and you absolutely want to use a regex, this'll do the job for you.

1 Comment

Thanks Akshay, but I am trying to get the entire function not just the function name, so the regex needs to terminate with something like '\n\S' and would need to match multiple lines as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.