2

I'm having trouble writing a regular expression for the following. I have a vector of literals (see RE_LIT) and I would like to find all the vectors in a line of text but I'm having difficulty writing the regular expression. Specifically I seem to have issues with the parenthesis acting as groups and not parenthesis.

RE_LABEL1 = r'[cvx]\d+(?![.]r)$'
RE_LABEL2 = r'v\d+\.r\d+'
RE_LABEL = r'(%s)|(%s)' % (RE_LABEL1, RE_LABEL2)
RE_LIT = r'!?%s' % RE_LABEL
RE_VEC = r'\[\s*(\s*%s\s*,?\s*)+\s*\]' % RE_LIT

Example string to match:

test = 'c1 = blah([v3,v4,v5.r1,!v6,v7,x8,v9,v10], [v1, v2], [x5.r1])'

Expected results:

> print re.findall(RE_VEC, test)
['[v3,v4,v5.r1,!v6,v7,x8,v9,v10]', '[v1, v2]']

Thank you ahead of time for your help.

3
  • 1
    "I seem to have issues with the parenthesis acting as groups and not parenthesis" - parentheses have meaning in regular expressions. If you want them to just be characters, escape them with a prefixed backslash: \(. Your example isn't a very useful test, as r'\[[^]]+]' would work fine... Commented Apr 28, 2015 at 21:26
  • Jonrsharpe: If I were to use this method how would I ensure that the entry's in the list are literals (RE_LIT) not just characters? Commented Apr 28, 2015 at 22:40
  • Yes parentheses have meaning in regular expressions but is there some mechanism that will act like parentheses normally does for nesting? Commented Apr 28, 2015 at 22:43

2 Answers 2

1

You can use the following fix:

import re
RE_LABEL1 = r'[cvx]\d+(?![.]r)'
RE_LABEL2 = r'v\d+\.r\d+'
RE_LABEL = r'%s|%s' % (RE_LABEL1, RE_LABEL2)
RE_LIT = r'\!?%s),?\s*' % RE_LABEL
RE_VEC = r'(?:(?:%s)+' % RE_LIT
test = '[v3,v4,v5.r1,!v6,v7,x8,v9,v10], [v1, v2]'
print re.findall(RE_VEC, test)

Output of an IDEONE demo:

['v3,v4,v5.r1,!v6,v7,x8,v9,v10', 'v1, v2']
Sign up to request clarification or add additional context in comments.

1 Comment

Does this work for you or do you need another approach?
0
import re
RE_LABEL1 = r'[cvx]\d+(?=[ ,\]])'
RE_LABEL2 = r'v\d+\.r\d+(?=[ ,\]])'
RE_LABEL = r'%s|%s' % (RE_LABEL1, RE_LABEL2)
RE_LIT = r'\!?%s' % RE_LABEL
RE_VEC = r'\[\s*(?:(?:\s*%s\s*\s*),?)+\s*\]' % RE_LIT
test = '[v3,v4,v5.r1,!v6,v7,x8,v9,v10], [v1, v2], [v1, x2.r2]'
print re.findall(RE_VEC, test)

Thank you stribizhev for your help, it got me half of the way there. The above is the finial solution.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.