0

I have this code:

reg = re.search('<div class="col result_name">(.*)</div>', html)
print 'Value is', reg.group()

Where 'html' contains something like this:

        <div class="col result_name">
            <h4>Blah</h4>
            <p>
                blah
            </p>
        </div>

But it's not returning anything.

Value is
Traceback (most recent call last):
  File "run.py", line 37, in <module>
    print 'Value is', reg.group()
4
  • 4
    ... and this is why you should NOT 'parse' HTML with regex. Commented Jan 10, 2011 at 18:40
  • Read this then use the appropriate tools for parsing html. Commented Jan 10, 2011 at 18:40
  • 3
    @A A: no it isn't. <div><div></div></div> is. Commented Jan 10, 2011 at 18:44
  • @A A: No, that is why you should not 'parse' anything with regex without reading the re docs. Commented Jan 10, 2011 at 20:46

3 Answers 3

6

Don't use regex to parse html. Use a html parser

import lxml.html
doc = lxml.html.fromstring(your_html)
result = doc.xpath("//div[@class='col result_name']")
print result

Obligatory link:

RegEx match open tags except XHTML self-contained tags

Sign up to request clarification or add additional context in comments.

3 Comments

I'm getting results like this: [<Element div at 0xb72edc8c>, <Element div at 0xb72edcbc>
@Zeno: Yeah, those are all the divs lxml found in your html. The elements. You can print them, or do further parsing with them. For example, try this: for onediv in result: print lxml.html.tostring(onediv, pretty_print=True)
Does xpath support regex? I want to do something like (col|row) in there.
3

The dot does not neccessarily match newlines in REs, you need the DOTALL flag (?s) for that.

Comments

2

http://docs.python.org/library/re.html :

The special characters are:

'.' (Dot.) In the default mode, this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.