How to loop from a list of urls by clicking the xpath and extract data using Selenium in Python?

Question

I am extracting board members from a list of URLs. For each url in the URL_lst, click the first xpath (ViewMore to expand the list), then extract values from the second xpath (BoardMembers' info). Below are the three companies I want to extract info: https://www.bloomberg.com/quote/FB:US, https://www.bloomberg.com/quote/AAPL:US, https://www.bloomberg.com/quote/MSFT:US

My code is shown below but doesn't work. The Output list is not aggregated. I know sth wrong with the loop but don't know how to fix it. Can anyone tell me how to correct the code? Thanks!

URL_lst = ['https://www.bloomberg.com/quote/FB:US','https://www.bloomberg.com/quote/AAPL:US','https://www.bloomberg.com/quote/MSFT:US']

Outputs = []
driver = webdriver.Chrome(r'xxx\chromedriver.exe')

for url in URL_lst:
    driver.get(url)
    for c in driver.find_elements_by_xpath("//*[@id='root']/div/div/section[3]/div[10]/div[2]/div/span[1]"):
        c.click()
        for e in (c.find_elements_by_xpath('//*[@id="root"]/div/div/section[3]/div[10]/div[1]/div[2]/div/div[2]')[0].text.split('\n'):
            Outputs.append(e)

print(Outputs)

Are you seeing an error message in your code? Which line specifically is giving you an error here? Posting the page HTML to see what you are basing your XPaths on would be helpful too. — CEH
– CEH, Commented Oct 15, 2019 at 15:35

CEH · Accepted Answer · 2019-10-15 16:29:17Z

1

Based on the URLs you provided, I did some refactoring for you. I added wait on each item you are trying to click and a scrollIntoView Javascript call to scroll down to the View More button. You were originally clicking View More buttons in a loop, but your XPath only returned 1 element, so the loop was redundant.

I also refactored your selector for board members to query directly on the div element containing their names. Your original query was finding a div several levels above the actual name text, which is why your Outputs list was returning empty.

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from time import sleep

URL_lst = ['https://www.bloomberg.com/quote/FB:US','https://www.bloomberg.com/quote/AAPL:US','https://www.bloomberg.com/quote/MSFT:US']

Outputs = []
driver = webdriver.Chrome(r'xxx\chromedriver.exe')

wait = WebDriverWait(driver, 30)

for url in URL_lst:
    driver.get(url)

    # get "Board Members" header
    board_members_header = wait.until(EC.presence_of_element_located((By.XPATH, "//h2[span[text()='Board Members']]")))

    # scroll down to board members
    driver.execute_script("arguments[0].scrollIntoView();", board_members_header)

    # get view more button
    view_more_button = wait.until(EC.presence_of_element_located((By.XPATH, "//section[contains(@class, 'PageMainContent')]/div/div[2]/div/span[span[text()='View More']]")))

    # click view more button
    view_more_button.click()

    # wait on 'View less' to exist, meaning list is expanded now
    wait.until(EC.presence_of_element_located((By.XPATH, "//section[contains(@class, 'PageMainContent')]/div/div[2]/div/span[span[text()='View Less']]")))


    # wait on visibility of board member names
    wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[contains(@class, 'boardWrap')]//div[contains(@class, 'name')]")))

    # get list of board members names
    board_member_names = driver.find_elements_by_xpath("//div[contains(@class, 'boardWrap')]//div[contains(@class, 'name')]")

    for board_member in board_member_names:
        Outputs.append(board_member.text)

    # explicit sleep to avoid being flagged as bot
    sleep(5)

print(Outputs)

I also added an explicit sleep between URL grabs, so that Bloomberg does not flag you as a bot.

edited Oct 15, 2019 at 16:29

answered Oct 15, 2019 at 15:38

CEH

5,9142 gold badges19 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Arthur Morgan Over a year ago

Thank you, Christine! I'd like to know why the output is not aggregated? It doesn't return all board members from the three companies...

CEH Over a year ago

You'll probably need to do some checks throughout your loops to ensure WebElements are being located. I'm not a fan of using the for against a dynamic driver.find_elements statement, because you don't know if anything is being iterated at all. I'll refactor this code a bit to assist with debugging it. I'll also check out the link you included in your question and see if I can help with your selectors too.

Arthur Morgan Over a year ago

Thank you so much. I know something was wrong with the "for" loop but don't know how to fix it.

CEH Over a year ago

@ArthurMorgan I updated my answer with a bit of refactoring. Let me know how it's working for you, and I'll test it out on my end.

CEH Over a year ago

Also, it's worth noting -- when testing this out, bloomberg did flag me as a bot. That's not really an issue Selenium can fix, just a security measure implemented by the website itself.

|

Collectives™ on Stack Overflow

How to loop from a list of urls by clicking the xpath and extract data using Selenium in Python?

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related