0

I am extracting board members from a list of URLs. For each url in the URL_lst, click the first xpath (ViewMore to expand the list), then extract values from the second xpath (BoardMembers' info). Below are the three companies I want to extract info: https://www.bloomberg.com/quote/FB:US, https://www.bloomberg.com/quote/AAPL:US, https://www.bloomberg.com/quote/MSFT:US

My code is shown below but doesn't work. The Output list is not aggregated. I know sth wrong with the loop but don't know how to fix it. Can anyone tell me how to correct the code? Thanks!

URL_lst = ['https://www.bloomberg.com/quote/FB:US','https://www.bloomberg.com/quote/AAPL:US','https://www.bloomberg.com/quote/MSFT:US']

Outputs = []
driver = webdriver.Chrome(r'xxx\chromedriver.exe')

for url in URL_lst:
    driver.get(url)
    for c in driver.find_elements_by_xpath("//*[@id='root']/div/div/section[3]/div[10]/div[2]/div/span[1]"):
        c.click()
        for e in (c.find_elements_by_xpath('//*[@id="root"]/div/div/section[3]/div[10]/div[1]/div[2]/div/div[2]')[0].text.split('\n'):
            Outputs.append(e)

print(Outputs)
1
  • Are you seeing an error message in your code? Which line specifically is giving you an error here? Posting the page HTML to see what you are basing your XPaths on would be helpful too. Commented Oct 15, 2019 at 15:35

1 Answer 1

1

Based on the URLs you provided, I did some refactoring for you. I added wait on each item you are trying to click and a scrollIntoView Javascript call to scroll down to the View More button. You were originally clicking View More buttons in a loop, but your XPath only returned 1 element, so the loop was redundant.

I also refactored your selector for board members to query directly on the div element containing their names. Your original query was finding a div several levels above the actual name text, which is why your Outputs list was returning empty.

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from time import sleep

URL_lst = ['https://www.bloomberg.com/quote/FB:US','https://www.bloomberg.com/quote/AAPL:US','https://www.bloomberg.com/quote/MSFT:US']

Outputs = []
driver = webdriver.Chrome(r'xxx\chromedriver.exe')

wait = WebDriverWait(driver, 30)

for url in URL_lst:
    driver.get(url)

    # get "Board Members" header
    board_members_header = wait.until(EC.presence_of_element_located((By.XPATH, "//h2[span[text()='Board Members']]")))

    # scroll down to board members
    driver.execute_script("arguments[0].scrollIntoView();", board_members_header)

    # get view more button
    view_more_button = wait.until(EC.presence_of_element_located((By.XPATH, "//section[contains(@class, 'PageMainContent')]/div/div[2]/div/span[span[text()='View More']]")))

    # click view more button
    view_more_button.click()

    # wait on 'View less' to exist, meaning list is expanded now
    wait.until(EC.presence_of_element_located((By.XPATH, "//section[contains(@class, 'PageMainContent')]/div/div[2]/div/span[span[text()='View Less']]")))


    # wait on visibility of board member names
    wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[contains(@class, 'boardWrap')]//div[contains(@class, 'name')]")))

    # get list of board members names
    board_member_names = driver.find_elements_by_xpath("//div[contains(@class, 'boardWrap')]//div[contains(@class, 'name')]")

    for board_member in board_member_names:
        Outputs.append(board_member.text)

    # explicit sleep to avoid being flagged as bot
    sleep(5)

print(Outputs)

I also added an explicit sleep between URL grabs, so that Bloomberg does not flag you as a bot.

Sign up to request clarification or add additional context in comments.

9 Comments

Thank you, Christine! I'd like to know why the output is not aggregated? It doesn't return all board members from the three companies...
You'll probably need to do some checks throughout your loops to ensure WebElements are being located. I'm not a fan of using the for against a dynamic driver.find_elements statement, because you don't know if anything is being iterated at all. I'll refactor this code a bit to assist with debugging it. I'll also check out the link you included in your question and see if I can help with your selectors too.
Thank you so much. I know something was wrong with the "for" loop but don't know how to fix it.
@ArthurMorgan I updated my answer with a bit of refactoring. Let me know how it's working for you, and I'll test it out on my end.
Also, it's worth noting -- when testing this out, bloomberg did flag me as a bot. That's not really an issue Selenium can fix, just a security measure implemented by the website itself.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.