Selecting a string in python

Question

Say I have many options in a HTML page (opened as text file) as below,

<select id="my">
  <option id="1">1a</option>
  <option id="2">2bb</option>     
</select>

<select id="my1">
  <option id="11">11a</option>
  <option id="21">21bb</option>     
</select>

Now, I've searched for <select id=

with open('/u/poolla/Downloads/creat/xyz.txt') as f:
for line in f:
    line = line.strip()
    if '<select id=' in line:
        print "true"

Now, whenever <select id= occurs, I want to get the id value. that is, copy the string from " after id= till another " occurs

how do I do this in python?

Please! BeautifulSoup: stackoverflow.com/questions/1732348/… — sshashank124
– sshashank124, Commented Apr 9, 2014 at 12:34
@Wooble: You do know that BeautifulSoup uses pluggable parsers and that lxml, if installed, is the default, right? BeautifulSoup 4 is not about parsing (anymore) but about the object model. Which is pretty neat for most HTML tasks, really. — Martijn Pieters
– Martijn Pieters, Commented Apr 9, 2014 at 12:57
@Wooble: Use lxml if you want to use the ElementTree-on-steroids object model instead. Don't pick it because you think the parser might be better... — Martijn Pieters
– Martijn Pieters, Commented Apr 9, 2014 at 12:58

gog · Accepted Answer · 2014-04-09 13:01:38Z

3

An html parser library is usually better at parsing html than raw string functions or regular expressions. Here's an example with the standard HTMLParser class:

html = """
<select id="my">
  <option id="1">1a</option>
  <option id="2">2bb</option>
</select>

<select id="my1">
  <option id="11">11a</option>
  <option id="21">21bb</option>
</select>
"""

from HTMLParser import HTMLParser

class MyParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.ids = []

    def handle_starttag(self, tag, attrs):
        if tag == 'select':
            self.ids.extend(val for name, val in attrs if name == 'id')


p = MyParser()
p.feed(html)
print p.ids  # ['my', 'my1']

answered Apr 9, 2014 at 13:01

gog

11.4k2 gold badges29 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

luc · Accepted Answer · 2014-04-09 13:20:32Z

0

BeautifulSoup4 has a very useful select method which makes possible to parse an html document with css selectors

Something like the following code (not tested sorry :-) ), should make possible to get all the ids of the select tags of an html document.

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
tags = soup.select("select")
print [t.get("id", None) t for t in tags]

answered Apr 9, 2014 at 13:20

luc

43.4k25 gold badges132 silver badges173 bronze badges

Collectives™ on Stack Overflow

Selecting a string in python

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related