2

I am trying to get the value of an element that renders text upon clicking a dropdown. I am currently using implicity_wait() to make sure the element is appearing, but when I run the script, the .text call returns empty strings. If I slowly run each line of the script the .text values populate. Based on this i assume that I have to wait for the text to render, but I can't work out how to do this.

Looking at the expected conditions documentation all the of the text_to_be_present_... conditions want me to know what text I am waiting for. Since I am webscraping I don't know this and so I am trying to pass a regex condition to the text_ argument, that matches a generic form of the value I am looking for. I am not getting the expected result with the value still returning an empty string when I run the script.

Here is the code I am trying:

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options 
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.support.ui import WebDriverWait

#Set the options for running selenium as headless
options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")

#Create the driver object
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
driver.implicitly_wait(10)

output = []
driver.get(html)
nat_res_element = driver.find_element_by_xpath('//*[@id="accordion-theme"]/div[1]/div[1]/span')
nat_res_element.click()
element = WebDriverWait(driver, 10).until(EC.text_to_be_present_in_element_value(locator = By.xpath('//*[@id="collapse0"]/div/div/ul/li/span[2]'), text_ = '[\d].*'))
output.append(element.text)

The url is: https://projects.worldbank.org/en/projects-operations/project-detail/P159382. I am trying to access the values under the 'Environment and Natural Resource Management' dropdown. Since this is digit; digit; %, I am trying regex [\d].*.

Welcome a way to handle this.

2
  • All of the below solutions work. For my purposes I simply called element.click() and then put driver.page_source into BS, and re-ran my old code - as per @Celius Stingher's suggestion. That said, i think @F.Hoque's answer is technically the best answer to my question on waiting on an expected condition, and then calling .text on the element. Nonetheless @undetected Selenium's answer works as well - calling get_attribute("innerHTML"). Thanks very much for all the help. Commented Jul 30, 2022 at 2:14
  • @undetectedSelenium's answer not only speaks about get_attribute("innerHTML") but also demonstrates 4 options involving Css/text + XPath/getAttribute, along with an explanation why text_to_be_present_in_element_value() doesn't suits your usecase. Commented Jul 30, 2022 at 7:21

3 Answers 3

1
climate_change = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '(//*[@class="twolevel"]//li//span)[2]'))).text
adaptation = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '(//*[@class="twolevel"]//li//span)[4]'))).text
mitigation = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '(//*[@class="twolevel"]//li//span)[6]'))).text

The above xpath expressions will pull the desired data from the 'Environment and Natural Resource Management' dropdown.

It's working fine with non-headless browser.

Full Script:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("--window-size=1920,1200")
#options.add_argument("--headless")


s = Service("./chromedriver") ## path to where you saved chromedriver binary
driver = webdriver.Chrome(service=s, options=options)

url = 'https://projects.worldbank.org/en/projects-operations/project-detail/P159382'
driver.get(url)
time.sleep(5)

nat_res_element = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="accordion-theme"]/div[1]/div[1]/span')))
nat_res_element.click()
data=[]
climate_change = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '(//*[@class="twolevel"]//li//span)[2]'))).text
adaptation = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '(//*[@class="twolevel"]//li//span)[4]'))).text
mitigation = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '(//*[@class="twolevel"]//li//span)[6]'))).text
data.append({
    'Climate change':climate_change,
    'Adaptation':adaptation,
    'Mitigation':mitigation
    })

print(data)

driver.quit()
  

Output:

[{'Climate change': '64%', 'Adaptation': '32%', 'Mitigation': '32%'}]
Sign up to request clarification or add additional context in comments.

Comments

1

I usually like to combine Selenium with BeautifulSoup. Thank for sharing all the details, this would be my approach:

driver.get("https://projects.worldbank.org/en/projects-operations/project-detail/P159382")

raw_source = driver.page_source
parsed = BeautifulSoup(raw_source,"html.parser")

variables = [x.text for x in parsed.find_all(class_='table-accordion-wrapper ta-block ng-star-inserted')[0].find_all(class_='proj-theme')]
values = [x.text for x in parsed.find_all(class_='table-accordion-wrapper ta-block ng-star-inserted')[0].find_all(class_='proj-theme-percentage')]

df = pd.DataFrame({'variables':variables,'values':values})


print(df)

Returns:

        variables values
0  Climate change    64%
1      Adaptation    32%
2      Mitigation    32%

The first find_all accesses the Theme table, which contains 4 (expandables) tables. Given we only want the first one, I am forcing a [0] after the first find_all(). (but if you'd like the other values from the other tables you can make a listed nest comprehension). The second find_all(), iterates over the rows in the subtable, accessing Climate, Adaptation and Mitigation.

You can of course further manipulate to generate a formar you'd like such as:

df = df.set_index('variables').T

Returning:

variables Climate change Adaptation Mitigation
values               64%        32%        32%

3 Comments

Thanks for this. My initial approach combined Selenium and BS, but i found that some pages would lose the data hidden in the dropdown when turning into a BS object (see this case for example: projects.worldbank.org/en/projects-operations/project-detail/…). It was based on this, that i switched to trying to find and access the elements using Selenium methods. On this front, i should apologize for a typo in my original question. I left out the code where i called click on the dropdown element. Fixed now.
You should be able to click to expand with selenium and then use page_source to get the expanded date.
Great, even though this doesn't answer my question, this is probably easiest for me given I already have the code written to parse the BS objects.
1

text_to_be_present_in_element_value()

text_to_be_present_in_element_value() is the expectation for checking if the given text is present in the element’s value and is defined as:

def text_to_be_present_in_element_value(locator, text_):
    """
    An expectation for checking if the given text is present in the element's value.
    locator, text
    """

    def _predicate(driver):
    try:
        element_text = driver.find_element(*locator).get_attribute("value")
        return text_ in element_text
    except StaleElementReferenceException:
        return False

    return _predicate

This usecase

You need to consider a couple of things here as follows:

  • Expected Condition of text_to_be_present_in_element_value() checks if the given text is present in the element's value attribute but not the text / innerText which is 64%
  • Expected Condition doesn't support regex, as a result the supplied regex [\d].* will be considered as a string.

Solution

To extract the text 64% ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR and text attribute:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div#collapse0 ul.twolevel li.firstlevel span.proj-theme +span"))).text)
    
  • Using XPATH and get_attribute("innerHTML"):

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[.='Climate change']//following::span[1]"))).get_attribute("innerHTML"))
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

1 Comment

This works. Thanks for the help. Chose another answer, as i thought it was closer to my specific question. Nonetheless, thanks for the help on the specific usecase and solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.