0

import urllib from bs4 import BeautifulSoup import re

sumt = 0

html = urllib.urlopen('http://python-data.dr-chuck.net/comments_338391.html').read()

soup = BeautifulSoup(html)

tags = soup('span')

for lne in tags: lne = str(lne) data = re.findall('[0-9]+',lne) data[0] = int(data[0]) sumt = sumt + data[0]

print sumt

Error:

IOError: [Errno socket error] [Errno 11004] getaddrinfo failed
1

1 Answer 1

1

Please note, that urllib.urlopen is deprecated; you should use urllib2.urlopen.

Anyhow, for me both versions work fine.

import urllib2
import re

if __name__ == '__main__':
    url = 'http://python-data.dr-chuck.net/comments_338391.html'
    comments = {}
    pattern = re.compile('<tr><td>(?P<name>.+?)</td>.+?class="comments">(?P<count>\d+)</span>.+?')
    for line in urllib2.urlopen(url).read().split('\n'):
        m = pattern.match(line)
        if m:
            comments[m.group('name')] = int(m.group('count'))
    print(comments)

Yields:

{'Caidan': 28, 'Haylie': 59, 'Fikret': 43, 'Tabbitha': 54, 'Rybecca': 70, 'Pearl': 45, 'Kiri': 72, 'Storm': 66, 'Kelum': 55, 'Elisau': 30, 'Lexi': 70, 'Cobain': 2, 'Theodore': 36, 'Ammer': 26, 'Carris': 87, 'Fion': 10, 'Derick': 28, 'Shalamar': 98, 'Adil': 93, 'Wasif': 54, 'Yasin': 78, 'Mhyren': 92, 'Kodi': 75, 'Nikela': 98, 'Lorena': 76, 'Seth': 68, 'Lillia': 91, 'Nitya': 26, 'Tigan': 73, 'Jaii': 11, 'Kamran': 74, 'Arianna': 12, 'Mercedes': 92, 'Gregory': 40, 'Umaima': 83, 'Rhylee': 26, 'Kaia': 91, 'Hamid': 33, 'Lucien': 5, 'Zacharias': 92, 'Abir': 35, 'Teejay': 51, 'Muir': 43, 'Hena': 84, 'Alanas': 16, 'Lybi': 91, 'Atiya': 87, 'Kayleb': 7, 'Fletcher': 87, 'Lisandro': 78}

i.e.: works for me.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.