Regular Expression in python

Question

When the parenthesis were used in the below program output is ['www.google.com'].

import re
teststring = "href=\"www.google.com\""
m=re.findall('href="(.*?)"',teststring)
print m;

If parenthesis is removed in findall function output is ['href="www.google.com"'].

import re
teststring = "href=\"www.google.com\""
m=re.findall('href=".*?"',teststring)
print m;

Would be helpful if someone explained how it works.

Code you have provided is exactly the same. But probably you are talking about grouping in regular expressions in general. — Tadeck
– Tadeck, Commented Jan 22, 2013 at 11:37
I've fixed your example code to actually produce the output (which also were missing the quotes). I left in the redundant semicolons though; python does not need those. — Martijn Pieters
– Martijn Pieters, Commented Jan 22, 2013 at 11:45

Martijn Pieters · Accepted Answer · 2013-01-22 11:52:50Z

5

The re.findall() documentation is quite clear on the difference:

Return all non-overlapping matches of pattern in string, as a list of strings. […] If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

So .findall() returns a list containing one of three types of values, depending on the number of groups in the pattern:

0 capturing groups in the pattern (no (...) parenthesis): the whole matched string ('href="www.google.com"' in your second example).
1 capturing group in the pattern: return the captured group ('www.google.com' in your first example).
more than 1 capturing group in the pattern: return a tuple of all matched groups.

Use non-capturing groups ((?:...)) if you don't want that behaviour, or add groups if you want more information. For example, adding a group around the href= part would result in a list of tuples with two elements each:

>>> re.findall('(href=)"(.*?)"', teststring)
[('href=', 'www.google.com')]

edited Jan 22, 2013 at 11:52

answered Jan 22, 2013 at 11:38

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Martijn Pieters Over a year ago

Why should it be? It only would return a list of tuples (if that's what you meant) if there is more than 1 group.

Vindhya G Over a year ago

My doubt is why href= is not included in the output even though it matches the pattern.i.e how does groups behave in this example..sorry i m new to python

Martijn Pieters Over a year ago

@vindhya: The href is not grouped. Only the part matched by (.*?) (a capturing group) is returned. When you remove the group, the whole match is returned.

Collectives™ on Stack Overflow

Regular Expression in python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related