Python Web Scraping, breaking down in loop

Question

I'm trying to scrape text from a series of hyperlinks on a main page and then store the results as a list of string objects. The code I've written works when I perform it on an individual link, but it breaks down when I try to loop through all the links.

FYI, my base url looks like this:

base_url = "http://www.achpr.org"

And my hyperlinks look like this:

hyperlinks = ['/sessions/58th', 
'/sessions/58th/resolutions/337/', 
'/sessions/58th/resolutions/338/', 
'/sessions/58th/resolutions/339/', ...]

So this works fine:

r = requests.get('http://www.achpr.org' + "/sessions/19th-eo/resolutions/328/")
    soup = BeautifulSoup(r.text, "lxml")
    soup.find('b').span.string
    text = soup.findAll('span')

y = []
for i in text:
    x = i.strings #returns string within tags
    y.extend(x)

y = "".join(y)
y = y.replace("\n", " ")
y = y.replace("\xa0*", " ")
print(ok)

But when I try to turn this into a loop:

output = []

for item in hyperlinks:
    r = requests.get('http://www.achpr.org' + link)
    soup = BeautifulSoup(r.text, "lxml")
    soup.find('b').span.string
    text = soup.findAll('span')

    y = []
    for i in text:
        x = i.strings #returns string within tags (so no tags)
        y.extend(x)

    y = "".join(y)
    y = y.replace("\n", " ")
    y = y.replace("\xa0*", " ")
    output.extend(y)

I get the following error:

Error message

It feels like I'm making a really simple looping error (putting indents in the wrong place), but I've been staring at this too long and I'd like a fresh pair of eyes. Can anyone spot what I'm doing wrong?

Rahn · Accepted Answer · 2016-05-22 15:06:58Z

1

It's not an indent error I suppose.

for item in hyperlinks:
    r = requests.get('http://www.achpr.org' + link)
    soup = BeautifulSoup(r.text, "lxml")
    if soup.find('b').span is None:
        continue
    soup.find('b').span.string
    text = soup.findAll('span')

Add an if test before soup.find('b').span.string.

answered May 22, 2016 at 15:06

Rahn

5,5655 gold badges38 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python Web Scraping, breaking down in loop

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related