1

Suppose we have this:

html = 'http://example.com'
regex = '<(\d{0,2})>'
regex1 = '<span>(.+?)</span>' 
p = re.compile(regex)
p1 = re.compile(regex1)

Is it possible to re.findall both p and p1 within one findall statement?

3
  • Couldn't you just use regex = '(<(\d{0,2})>|<span>(.+?)</span>)' ? Commented Mar 11, 2013 at 14:15
  • Not sure this would work. I need to find both (there are always both present) and when python approaches the first one and evaluates to True then will skip the 2nd statement i guess. Commented Mar 11, 2013 at 14:20
  • Oh, I see... In that case I'm not sure as Python documentation says findall returns all non-overlapping matches. There may be a way but I don't know of one - if not, could you consider merging the two result arrays? Commented Mar 11, 2013 at 14:32

1 Answer 1

1

First of all: You generally want to avoid using regular expressions to parse HTML. You really want to use a HTML parser instead. BeautifulSoup lets you search for elements with a certain text contained (even using regular expressions for matching specific aspects found in the HTML)

You can combine regular expressions using the | pipe, in a group:

p_or_p1 = re.compile('(?:{}|{})'.format(p, p1))
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the HTML parser advice. I'll surely try to learn this but until I do I'm kinda stuck with the regex at least for the current project. btw. can you recommand any worthwhile url to the HTML parser stuff except for the official documentation?
I can't recommend any BeautifulSoup tutorials because I never read one myself. :-) The documentation is pretty straightforward though; you can always look through the questions here on SO (I answered a fair number of BS questions).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.