7

can someone help me writing single regex to get module(s) from python source line?

from abc.lmn import pqr
from abc.lmn import pqr as xyz
import abc
import abc as xyz

it has 3 sub parts in it

[from(\s)<module>(\s)] --> get module if this part exist
import(\s)<module>     --> get module
[(\s)as(\s)<alias>]    --> ignore if this part exist

something like this

:?[from(\s)<module>(\s)]import(\s)<module>:?[(\s)as(\s)<alias>]
1
  • 2
    You should base it on the grammar. Commented Jul 8, 2017 at 16:48

2 Answers 2

23

Instead of using a regex, using the built in python library ast might be a better approach. https://docs.python.org/2/library/ast.html You can use it to parse python syntax.

import ast

import_string = """from abc.lmn import pqr
from abc.lmn import pqr as xyz
import abc
import abc as xyz"""

modules = []
for node in ast.iter_child_nodes(ast.parse(import_string)):
    if isinstance(node, ast.ImportFrom):
        if not node.names[0].asname:  # excluding the 'as' part of import
            modules.append(node.module)
    elif isinstance(node, ast.Import): # excluding the 'as' part of import
        if not node.names[0].asname:
            modules.append(node.names[0].name)

that will give you ['abc.lmn', 'abc'] and it is fairly easy to tweak if you want to pull other information.

Sign up to request clarification or add additional context in comments.

3 Comments

Might be a good idea to use ast.walk if you want to find import statements not at the toplevel.
great nice idea
Great answer. However, this code fails to capture multiple imports, such as "import a, b, c, d". Instead of just using node.names[0], you should iterate over all node.names.
7

Looks like you could make the from optional and the import required at
the same time ignoring the as.

(?m)^(?:from[ ]+(\S+)[ ]+)?import[ ]+(\S+)[ ]*$

https://regex101.com/r/fmoAuh/1

Explained

 (?m)                          # Modifiers: multi-line
 ^                             # Beginning of line
 (?:                           # Optional from
      from [ ]+ 
      ( \S+ )                       # (1), from <module>
      [ ]+ 
 )?

 import [ ]+                   # Required import
 ( \S+ )                       # (2), import <module>
 [ ]* 
 $                             # End of line

Or, if you want to match the as but do not want to capture anything, use this.

(?m)^(?:from[ ]+(\S+)[ ]+)?import[ ]+(\S+)(?:[ ]+as[ ]+\S+)?[ ]*$

https://regex101.com/r/xFtey5/1

Expanded

 (?m)                          # Modifiers: multi-line
 ^                             # Beginning of line
 (?:                           # Optional from
      from [ ]+ 
      ( \S+ )                       # (1), from <module>
      [ ]+ 
 )?

 import [ ]+                   # Required import
 ( \S+ )                       # (2), import <module>

 (?:                           # Optional as
      [ ]+ 
      as [ ]+ 
      \S+                          # <alias>
 )?
 [ ]* 
 $ 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.