2

I am trying to Web-scrape the country names from the following page - http://hdr.undp.org/en/composite/trends

I am trying to get the Xpath of the particular element.

So for the first country, it appears like this -

Country = driver.find_element_by_xpath("//[@id='styleSheet.css']/div/div/div/div/table/tbody/tr[2]/td[2]").text

So basically for all the countries, I am using the For loop and range function in python.

for i in range(2,193):
    try:
        print(i)
        Country = driver.find_element_by_xpath("//[@id='styleSheet.css']/div/div/div/div/table/tbody/tr["+int(i)+"]/td[11]").text
        print(Country)
    except Exception:
        print("none")

But the problem is the X-path doesn't work for me. Kindly help me in locating the right element.

I resolved the first problem by changing the int to str as that was the error throwing up.After that it says cannot locate the current element.

4
  • 1
    I don't think you can just concatenate a string to an int like that. You can use the '{}'.format() method. Commented Dec 5, 2017 at 15:14
  • Does the Exception provide any info? i.e. you can except Exception as e then print(e) Commented Dec 5, 2017 at 15:15
  • @SuperStew No the first place where I am taking the country itself is going wrong, even outside the loop. Commented Dec 5, 2017 at 15:16
  • 1
    a stacktrace would be much more helpful to others in debugging the problem. Commented Dec 5, 2017 at 15:16

1 Answer 1

2

You don't have to use XPaths for every single selenium element location problem. There are better ways to locate the countries in this case. What if you would go through every tr element inside the tbody of the table and get the second td element containing a country name:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome()
driver.get("http://hdr.undp.org/en/composite/trends")

table = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".pane-content table")))
for row in table.find_elements_by_css_selector("tbody > tr")[1:]:  # skipping the first header row
    country = row.find_element_by_css_selector("td:nth-child(2)")

    print(country.text)

driver.close()

Prints:

Norway
Australia
Switzerland
...
San Marino
Somalia
Tuvalu
Sign up to request clarification or add additional context in comments.

2 Comments

Why is it failing to pick up the value for years. Say for year 2012, I am trying to get it like this val = row.find_element_by_css_selector("td:nth-child(10)") but it is failing.
@Sid29 right, cause not every single row has 10 or more columns, you have to skip the rows which don't. You can do that with a try/expect or check how much cells are there in a row. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.