1

i try to webscrape this part of a html:

<td class="zebraTable__td zebraTable__td--companyName"><a href="/unternehmen/8116602/schneider-electric-holding-germany-gmbh" data-gtm="companySearch__searchResult--76">
                        Schneider Electric Holding Germany GmbH
                    </a></td>

HTML Code

from this Site:

https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=0&employeesTo=100000000&sortMethod=revenueDesc&p=4

with this Code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
import pandas as pd
import time 

driver = webdriver.Chrome('/Users/rieder/Anaconda3/chromedriver_win32/chromedriver.exe')

driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=500&employeesTo=100000000&sortMethod=revenueDesc&p=1')

driver.find_element_by_id("cookiesNotificationConfirm").click();

company_name = driver.find_element_by_class_name('zebraTable__td zebraTable__td--companyName')

print(company_name)

I tried it for 4 hours and cant get it. I tried it with different methods like xpath, link text etc. but all i got is a empty company Name like this "[ ]".

Does someone know how selenium can find this exact piece of text "Liebherr-Hausgeräte Ochsenhausen GmbH"?

Thanks a lot.

2 Answers 2

0

To print the text Schneider Electric Holding Germany GmbH you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR and text attribute:

    driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=0&employeesTo=100000000&sortMethod=revenueDesc&p=4')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#cookiesNotificationConfirm"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.zebraTable.zebraTable--companies tr:nth-child(2)>td.zebraTable__td.zebraTable__td--companyName>a"))).text)
    
  • Using XPATH and get_attribute("innerHTML"):

    driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=0&employeesTo=100000000&sortMethod=revenueDesc&p=4')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@id='cookiesNotificationConfirm']"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='zebraTable zebraTable--companies']//following::tr[2]/td[@class='zebraTable__td zebraTable__td--companyName']/a"))).get_attribute("innerHTML"))
    
  • Console Output:

    Schneider Electric Holding Germany GmbH
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python


Outro

Link to useful documentation:

Sign up to request clarification or add additional context in comments.

3 Comments

This worked like a acharm, thanks a lot. I tried to implement this Code into my whole Code which tries to generate a list of all the Company Names for 500 or more employees, but it always takes just the first Name in the list if I repeat the command. I think its because .get_attribute() only gets one attribute and not all attributes foung in the xpath?!
@Yankzz This answer is specifically to extract the text Schneider Electric Holding Germany GmbH. For the list of all the Company Names we need to adjust the locators. can you raise a new question with your new requirement please?
Thanks, I opened a new Question: stackoverflow.com/questions/63669207/…
0

What you are looking for can be found in the source code of the page under

<div data-company-search><div data-var-name="companyResults" data and it is part of the page source. So you do not need selenium in order to get it. just read the page with requests and find the data using Beautiful Soup .

1 Comment

you are right! but i need this part of code for a code that generates a list of all the employees name. my fault, should have explained the whole thing, sorry

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.