Python - Selenium - cant webscrape specific text content from html

Question

i try to webscrape this part of a html:

<td class="zebraTable__td zebraTable__td--companyName"><a href="/unternehmen/8116602/schneider-electric-holding-germany-gmbh" data-gtm="companySearch__searchResult--76">
                        Schneider Electric Holding Germany GmbH
                    </a></td>

HTML Code

from this Site:

https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=0&employeesTo=100000000&sortMethod=revenueDesc&p=4

with this Code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
import pandas as pd
import time 

driver = webdriver.Chrome('/Users/rieder/Anaconda3/chromedriver_win32/chromedriver.exe')

driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=500&employeesTo=100000000&sortMethod=revenueDesc&p=1')

driver.find_element_by_id("cookiesNotificationConfirm").click();

company_name = driver.find_element_by_class_name('zebraTable__td zebraTable__td--companyName')

print(company_name)

I tried it for 4 hours and cant get it. I tried it with different methods like xpath, link text etc. but all i got is a empty company Name like this "[ ]".

Does someone know how selenium can find this exact piece of text "Liebherr-Hausgeräte Ochsenhausen GmbH"?

Thanks a lot.

undetected Selenium · Accepted Answer · 2020-08-27 12:12:57Z

0

To print the text Schneider Electric Holding Germany GmbH you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR and text attribute:

driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=0&employeesTo=100000000&sortMethod=revenueDesc&p=4')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#cookiesNotificationConfirm"))).click()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.zebraTable.zebraTable--companies tr:nth-child(2)>td.zebraTable__td.zebraTable__td--companyName>a"))).text)

Using XPATH and get_attribute("innerHTML"):

driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=0&employeesTo=100000000&sortMethod=revenueDesc&p=4')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@id='cookiesNotificationConfirm']"))).click()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='zebraTable zebraTable--companies']//following::tr[2]/td[@class='zebraTable__td zebraTable__td--companyName']/a"))).get_attribute("innerHTML"))

Console Output:

Schneider Electric Holding Germany GmbH

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

Outro

Link to useful documentation:

get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

edited Aug 27, 2020 at 12:12

answered Aug 27, 2020 at 12:06

undetected Selenium

194k44 gold badges304 silver badges387 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Yankzz Over a year ago

This worked like a acharm, thanks a lot. I tried to implement this Code into my whole Code which tries to generate a list of all the Company Names for 500 or more employees, but it always takes just the first Name in the list if I repeat the command. I think its because .get_attribute() only gets one attribute and not all attributes foung in the xpath?!

undetected Selenium Over a year ago

@Yankzz This answer is specifically to extract the text Schneider Electric Holding Germany GmbH. For the list of all the Company Names we need to adjust the locators. can you raise a new question with your new requirement please?

Yankzz Over a year ago

Thanks, I opened a new Question: stackoverflow.com/questions/63669207/…

balderman · Accepted Answer · 2020-08-27 11:53:55Z

0

What you are looking for can be found in the source code of the page under

<div data-company-search><div data-var-name="companyResults" data and it is part of the page source. So you do not need selenium in order to get it. just read the page with requests and find the data using Beautiful Soup .

answered Aug 27, 2020 at 11:53

balderman

24k8 gold badges39 silver badges60 bronze badges

1 Comment

Yankzz Over a year ago

you are right! but i need this part of code for a code that generates a list of all the employees name. my fault, should have explained the whole thing, sorry

Collectives™ on Stack Overflow

Python - Selenium - cant webscrape specific text content from html

2 Answers 2

Outro

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Outro

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related