Python: Using regex stored in CSV

Question

I am just testing out a small python script of which I will use part in a larger script. Basically I am trying to lookup a field in a CSV file (where it contains a regex), and use this in a regex test. The reason is (part of a very wierd use-case) and will enable easier maintenance of a CSV file instead of the script. Is there something I am missing with the following....

test.csv:

field0,field1,field2
foo,bar,"\d+\.\d+"
bar,foo,"\w+"

test.py (extra print's used for testing):

import sys
import re
import csv

input = sys.argv[1]
print input

reader = csv.reader(open('test.csv','rb'), delimiter=',', quotechar="\"")
for row in reader:
        print row
        value = row[0]
        print value
        if value in input:
                regex = row[2]
                print regex

                pat = re.compile(regex)
                test = re.match(pat,input)
                out = test.group(1)
                print out

If I pass a value like "foo blah 38902462986.328946239846" to the script, I would expect this to pick up that it contains foo and then use the regex, \d+\.\d+, to extract 38902462986.328946239846. However when I run the script I get the following:

foo blah 0920390239.90239029
['field0', 'field1', 'field2']
field0
['foo', 'bar', '\\d+\\.\\d+']
foo
\d+\.\d+
Traceback (most recent call last):
  File "reg.py", line 19, in <module>
    out = test.group(1)
AttributeError: 'NoneType' object has no attribute 'group'

Not sure what's going on really.

P.S Python is a big world and still learning.

Your code seems incorrectly idented. If test is None then re.match failed (that's what it returns on failure). And this might be because re.match expects a string as the first parameter, not a compiled pattern. — cdleonard
– cdleonard, Commented Oct 17, 2012 at 11:37

detunized · Accepted Answer · 2012-10-17 11:48:25Z

1

According to the docs re.match matches at the beginning of the input string. You need to use re.search. Also, there's no need to compile if you don't reuse them afterwards. Just say test = re.search(regex, input).

In the regular expressions in your example you don't have any capture groups, so test.group(1) is going to fail, even if there's a match in the input.

import sys
import re
import csv

input = 'foo blah 38902462986.328946239846'

reader = csv.reader(open('test.csv','rb'), delimiter=',', quotechar="\"")
for row in reader:
    value = row[0]
    if value in input:
        regex = row[2]
        test = re.search(regex, input)
        print input[test.start():test.end()]

Prints:

38902462986.328946239846

answered Oct 17, 2012 at 11:48

detunized

15.3k3 gold badges50 silver badges66 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

MHibbin Over a year ago

Thanks, that did the trick, it's probably because I'd used match previously so that stuck in my mind.

Collectives™ on Stack Overflow

Python: Using regex stored in CSV

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related