For Loops while using selenium for webscraping Python

Question

I am attempting to web-scrape info off of the following website: https://www.axial.net/forum/companies/united-states-family-offices/

I am trying to scrape the description for each family office, so "https://www.axial.net/forum/companies/united-states-family-offices/"+insert_company_name" are the pages I need to scrape.

So I wrote the following code to test the program for just one page:

from bs4 import BeautifulSoup as soup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome('insert_path_here/chromedriver')
driver.get("https://network.axial.net/company/ansaco-llp")
page_source = driver.page_source
soup2 = soup(page_source,"html.parser")
soup2.findAll('axl-teaser-description')[0].text

This works for the single page, as long as the description doesn't have a "show full description" drop down button. I will save that for another question.

I wrote the following loop:

#Note: Lst2 has all the names for the companies. I made sure they match the webpage
lst3=[]
for key in lst2[1:]:
    driver.get("https://network.axial.net/company/"+key.lower())
    page_source = driver.page_source


    for handle in driver.window_handles:
         driver.switch_to.window(handle)
    word_soup = soup(page_source,"html.parser")



    if word_soup.findAll('axl-teaser-description') == []:
        lst3.append('null')
    else:
        c = word_soup.findAll('axl-teaser-description')[0].text
        lst3.append(c)
print(lst3)

When I run the loop, all of the values come out as "null", even the ones without "click for full description" buttons.

I edited the loop to instead print out "word_soup", and the page is different then if I had run it without a loop and does not have the description text.

I don't understand why a loop would cause that but apparently it does. Does anyone know how to fix this problem?

Your first example for ansaco-llp does not work for me. It does not find the axl-teaser-description element. Page_source does not reflect that element if you print it and check it. — Sri
– Sri, Commented Apr 16, 2020 at 23:47
@Sri Not sure why it doesn't work for you, but I found the solution, which I will post in the next comment. — dergky
– dergky, Commented Apr 17, 2020 at 0:13

dergky · Accepted Answer · 2020-04-17 00:15:48Z

1

Found solution. pause the program for 3 seconds after driver.get:

import time
lst3=[]
for key in lst2[1:]:
    driver.get("https://network.axial.net/company/"+key.lower())
    time.sleep(3)
    page_source = driver.page_source



    word_soup = soup(page_source,"html.parser")



    if word_soup.findAll('axl-teaser-description') == []:
        lst3.append('null')
    else:
        c = word_soup.findAll('axl-teaser-description')[0].text
        lst3.append(c)
print(lst3)

answered Apr 17, 2020 at 0:15

dergky

1051 silver badge10 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alarm-1202 · Accepted Answer · 2020-04-16 23:58:45Z

0

I see that the page uses javascript to generate the text meaning it doesn't show up in the page source, which is weird but ok. I don't quite understand why you're only iterating through and switching to all the instances of Selenium you have open, but you definitely won't find the description in the page source / beautifulsoup.

Honestly, I'd personally look for a better website if you can, otherwise, you'll have to try it with selenium which is inefficient and horrible.

edited Apr 16, 2020 at 23:58

answered Apr 16, 2020 at 23:47

Alarm-1202

1201 gold badge2 silver badges13 bronze badges

2 Comments

dergky Over a year ago

The window_handles loop was unnecessary, I changed it in the solution.

Alarm-1202 Over a year ago

Right, I forgot browsers need to time to load a page while requests have that built-in and are virtually instant anyway.

Collectives™ on Stack Overflow

For Loops while using selenium for webscraping Python

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related