0

In python I'm trying to grab multiple inputs from string using regular expression; however, I'm having trouble. For the string:

inputs       =    12 1  345 543 2

I tried using:

match = re.match(r'\s*inputs\s*=(\s*\d+)+',string)

However, this only returns the value '2'. I'm trying to capture all the values '12','1','345','543','2' but not sure how to do this.

Any help is greatly appreciated!

EDIT: Thank you all for explaining why this is does not work and providing alternative suggestions. Sorry if this is a repeat question.

2
  • possible duplicate of Regex question about parsing method signature Commented May 28, 2013 at 14:32
  • You are facing the same problem as the linked question; your (...) group can only match once. Combine matching with splitting. Commented May 28, 2013 at 14:33

4 Answers 4

2

You could try something like: re.findall("\d+", your_string).

Sign up to request clarification or add additional context in comments.

Comments

1

You cannot do this with a single regex (unless you were using .NET), because each capturing group will only ever return one result even if it is repeated (the last one in the case of Python).

Since variable length lookbehinds are also not possible (in which case you could do (?<=inputs.*=.*)\d+), you will have to separate this into two steps:

match = re.match(r'\s*inputs\s*=\s*(\d+(?:\s*\d+)+)', string)
integers = re.split(r'\s+',match.group(1))

So now you capture the entire list of integers (and the spaces between them), and then you split that capture at the spaces.

The second step could also be done using findall:

integers = re.findall(r'\d+',match.group(1))

The results are identical.

Comments

1

You can embed your regular expression:

import re
s = 'inputs       =    12 1  345 543 2'
print re.findall(r'(\d+)', re.match(r'inputs\s*=\s*([\s\d]+)', s).group(1))
>>> 
['12', '1', '345', '543', '2']

Or do it in layers:

import re

def get_inputs(s, regex=r'inputs\s*=\s*([\s\d]+)'):
    match = re.match(regex, s)
    if not match:
        return False # or raise an exception - whatever you want
    else:
        return re.findall(r'(\d+)', match.group(1))

s = 'inputs       =    12 1  345 543 2'
print get_inputs(s)
>>> 
['12', '1', '345', '543', '2']

Comments

0

You should look at this answer: https://stackoverflow.com/a/4651893/1129561

In short:

In Python, this isn’t possible with a single regular expression: each capture of a group overrides the last capture of that same group (in .NET, this would actually be possible since the engine distinguishes between captures and groups).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.