0

I just want to check if there is any better way of doing this rather than using what i came up with.

The thing is that i need to parse a .py file, more precisely i have to look for a specific list named id_list that contains several int numbers. Numbers can be written in several formats.

For example:

id_list = [123456, 789123, 456789]

id_list = [    123456,
               789123,
               456789    ]

id_list = [    123456
               ,789123
               ,456789    ]

What i came up with works just fine, but for the sake of perfectionism i want to know if there is "smoother" way of doing so.

with open(filepath, 'rb') as input_file:
    parsed_string = ''
    start_flag = False
    start_parsing = False
    for line in input_file:
        if 'id_list' in line:
            id_detected = True
        if id_detected:
            for char in line:
                if char == '[':
                    start_parsing = True
                if start_parsing and char != '\n':
                    parsed_string += char
                if char == ']':
                    id_detected = False
                    start_parsing = False
                    break

After that has been done im just filtering parsed_string:

new_string = "".join(filter(lambda char: char.isdigit() or char == ',', parsed_string))

Which gets me string containing numbers and commas: 123456,789123,456789

So to wrap this up, is there anything that i could improve?

8
  • 4
    Why not just import the file and access id_list directly? Commented Nov 28, 2016 at 8:01
  • What about id_list = list()? Or x = [] then id_list = x? Commented Nov 28, 2016 at 8:05
  • @jonrsharpe Not quite sure if i should do that because there are a lot of files that needs to be parsed, importing all of them at once wouldn't be a good idea i suppose. Commented Nov 28, 2016 at 8:08
  • How come? Could you give some more context? Do these things change frequently? How do they get into the Python files to begin with? Commented Nov 28, 2016 at 8:09
  • 1
    If this is a one-time process, just use import and store a mapping in a more easily accessible format. Don't do it on every search, that's not efficient. It's not really clear why your output is a string, either; a list or set of IDs would be more usable, surely? Commented Nov 28, 2016 at 9:12

2 Answers 2

2

You can use a regular expression to solve:

import re

with open(filepath, 'rb') as input_file:
    text = input_file.read()
    match = re.search(r'id_list\s*=\s*\[(.*?)\]', text, flags=re.DOTALL)

    if match is None:
        print "Not found"

    else:
        id_list_str = match.group(1)
        id_list = map(int, id_list_str.split(','))
        print id_list
Sign up to request clarification or add additional context in comments.

2 Comments

eval and exec should generally be avoided because they can be a security risk. For details, please see Eval really is dangerous by SO veteran Ned Batchelder. Instead, you can use the safer alternative: ast.literal_eval.
Manually parsing the data is definitely much safer!
0

just use import and from

If you don't want to import the whole python files just import the elements you need

example

from filename.py import id_list

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.