I'm trying to scrape text from a series of hyperlinks on a main page and then store the results as a list of string objects. The code I've written works when I perform it on an individual link, but it breaks down when I try to loop through all the links.
FYI, my base url looks like this:
base_url = "http://www.achpr.org"
And my hyperlinks look like this:
hyperlinks = ['/sessions/58th',
'/sessions/58th/resolutions/337/',
'/sessions/58th/resolutions/338/',
'/sessions/58th/resolutions/339/', ...]
So this works fine:
r = requests.get('http://www.achpr.org' + "/sessions/19th-eo/resolutions/328/")
soup = BeautifulSoup(r.text, "lxml")
soup.find('b').span.string
text = soup.findAll('span')
y = []
for i in text:
x = i.strings #returns string within tags
y.extend(x)
y = "".join(y)
y = y.replace("\n", " ")
y = y.replace("\xa0*", " ")
print(ok)
But when I try to turn this into a loop:
output = []
for item in hyperlinks:
r = requests.get('http://www.achpr.org' + link)
soup = BeautifulSoup(r.text, "lxml")
soup.find('b').span.string
text = soup.findAll('span')
y = []
for i in text:
x = i.strings #returns string within tags (so no tags)
y.extend(x)
y = "".join(y)
y = y.replace("\n", " ")
y = y.replace("\xa0*", " ")
output.extend(y)
I get the following error:
It feels like I'm making a really simple looping error (putting indents in the wrong place), but I've been staring at this too long and I'd like a fresh pair of eyes. Can anyone spot what I'm doing wrong?