'TypeError: expected string or bytes-like object', while trying to get numbers from a web page with BeautifulSoup

Question

I'm trying to extract integers from a url with bs4. I imported re to get the numbers but I get the above error. I'm confused and would appreciate some help.

from urllib.request import urlopen
from bs4 import BeautifulSoup
import ssl
import re

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')
html = urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

# Retrieve all of the anchor tags
tags = soup('span')
for tag in tags:
    re.findall('<span.*[0-9].*',tag)

Link http://py4e-data.dr-chuck.net/comments_314936.html
Output expected: Print the numbers from the link

Add some output and link for us to be able to help from there. — loki
– loki, Commented Nov 12, 2019 at 5:11

shaik moeed · Accepted Answer · 2019-11-12 05:37:29Z

2

You can get the number directly by using .get_text(). And I have removed unnecessary code.

from urllib.request import urlopen
from bs4 import BeautifulSoup


url = 'http://py4e-data.dr-chuck.net/comments_314936.html'
html = urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')

# Retrieve all of the anchor tags
tags = soup('span')
for tag in tags:
    print(tag.get_text())

Output:

answered Nov 12, 2019 at 5:37

shaik moeed

5,5882 gold badges25 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

loki · Accepted Answer · 2019-11-12 05:39:34Z

1

'tag' is returned as a bs4.element.tag
That has to be received as a string to search within that.

from urllib.request import urlopen
from bs4 import BeautifulSoup
import ssl
import re

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')
html = urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

# Retrieve all of the anchor tags
tags = soup('span')

for tag in tags:
    word = re.findall('(\d+)',str(tag), re.I)
    word = ''.join(word)
    print(word)

edited Nov 12, 2019 at 5:39

answered Nov 12, 2019 at 5:30

loki

9721 gold badge10 silver badges25 bronze badges

Collectives™ on Stack Overflow

'TypeError: expected string or bytes-like object', while trying to get numbers from a web page with BeautifulSoup

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related