Unable to loop trough the links when crawling a website with Python and Selenium

Question

I want to crawl one website but I have a problem with looping trough page. I want to create a system that collects all links, then click on each link and collects data (date in this case). I wrote a code but I keep getting this error:

StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=98.0.4758.109)

I have tried to increase the sleep interval but the result is the same. The error happens after on second iteration (after first link).

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import requests
import time

# url for crawling
url = "https://bstger.weblaw.ch/?size=n_60_n"
    
# path to selenium
path = 'path to selenium'
driver = webdriver.Chrome(path)
driver.get(url)
time.sleep(4)    
    
# click on search button
buttonClickSearch = driver.find_element_by_xpath('//*[@id="root"]/div/div/div[2]/div[1]/div/div[3]/form/div/input').click()
time.sleep(3)    
    
# get all links
all_links = driver.find_elements_by_tag_name('li.sui-result div.sui-result__header a')
print(all_links)
print()

# loop trough links and crawl them
for link in all_links:
    
    # click on link
    print(link)
    time.sleep(4)
    click = link.click() # I GET THE ERROR HERE ON SECOND ITERATION
    time.sleep(4)
        
    # get date
    date = driver.find_element_by_tag_name('div.filter-data button.wlclight13').text
    day = date.split('.')[0]
    month = date.split('.')[1]
    year = date.split('.')[2]
    date = year + "-" + month + "-" + day
    print(date)
    print()
    
    # click on back button
    back_button = driver.find_element_by_xpath('//*[@id="root"]/div/section[1]/div[1]/div[1]/a').click()
    time.sleep(4)
    #scroll
    driver.execute_script("window.scrollTo(0, 200)")

Post the complete error message, so that we will be able to know which line of code is throwing that exception. — pmadhu
– pmadhu, Commented Feb 28, 2022 at 10:05
I get a error in for loop, when I want to click on the second link. Everything is good for first link (i get the data extraction) but when I want to click on the second link I get the error. — taga
– taga, Commented Feb 28, 2022 at 10:10
Refer the link - Regarding the error. Need to refine all_links inside the for loop. And also the website is so unstable - Clicking on back button does not navigate to previous page properly and other methods to navigate to previous page does not work either. — pmadhu
– pmadhu, Commented Feb 28, 2022 at 10:41

KunduK · Accepted Answer · 2022-02-28 11:38:27Z

1

Instead of elements get the href value and use driver.get() to navigate.

//Get the href value

all_links =[link.get_attribute('href') for link in driver.find_elements_by_css_selector('li.sui-result >.sui-result__header> a')]
print(all_links) 

for link in all_links:
    
    driver.get(link)
    driver.refresh()
        
    # get date
    date = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.filter-data button.wlclight13"))).text
    day = date.split('.')[0]
    month = date.split('.')[1]
    year = date.split('.')[2]
    date = year + "-" + month + "-" + day
    print(date)

If you want to go ahead with your code you need to re-assigned your element like below.

all_links = driver.find_elements_by_tag_name('li.sui-result div.sui-result__header a')
print(all_links)
print()

# loop trough links and crawl them
for link in range(len(all_links)):
   #Re-assined it again
    all_links = driver.find_elements_by_tag_name('li.sui-result div.sui-result__header a')
    # click on link
    print(all_links[link])
    time.sleep(4)
    all_links[link].click() 
    time.sleep(4)
        
    # get date
    date = driver.find_element_by_tag_name('div.filter-data button.wlclight13').text
    day = date.split('.')[0]
    month = date.split('.')[1]
    year = date.split('.')[2]
    date = year + "-" + month + "-" + day
    print(date)
    print()
    
    # click on back button
    back_button = driver.find_element_by_xpath('//*[@id="root"]/div/section[1]/div[1]/div[1]/a').click()
    time.sleep(4)
    #scroll
    driver.execute_script("window.scrollTo(0, 200)")

Update: Navigating url not refreshing the page. added driver.refresh() to appear the date.

all_links =[link.get_attribute('href') for link in driver.find_elements_by_css_selector('li.sui-result >.sui-result__header> a')]
print(all_links) 

for link in all_links:
    
    driver.get(link)
    driver.refresh()
        
    # get date
    date = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.filter-data button.wlclight13"))).text
    day = date.split('.')[0]
    month = date.split('.')[1]
    year = date.split('.')[2]
    date = year + "-" + month + "-" + day
    print(date)

You need to import below library.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

Output:

edited Feb 28, 2022 at 11:38

answered Feb 28, 2022 at 10:31

KunduK

33.4k5 gold badges19 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

taga Over a year ago

Yeah, but I need to do this in one 'window', with sending only one request. I have managed to apply the same logic that I have tried here to other websites, and it works good. I do not know what is the problem here

KunduK Over a year ago

@taga : Try the other answer I have posted and let me know how this goes.

taga Over a year ago

First example does not work good, it always returns the same date, and second example breaks after third link

KunduK Over a year ago

@taga : it seems need to do java scripts enabled however i have added page refreshed option to refreshed the page. Try out updated one.

pmadhu · Accepted Answer · 2022-02-28 11:02:01Z

As already mentioned clicking on Back button is unstable. But can use the Next button to navigate to other links.

And better to apply some Explicit waits.

driver.get("https://bstger.weblaw.ch/?size=n_60_n")

wait = WebDriverWait(driver,30)
actions = ActionChains(driver)

buttonClickSearch = wait.until(EC.element_to_be_clickable((By.XPATH,"//input[@aria-label='search button']")))
actions.move_to_element(buttonClickSearch).click()

time.sleep(5)
all_links = driver.find_elements(By.XPATH,"//div[@class='sui-result__header']/a")
all_links[0].click() # Click on the First link.

for i in range(20):
    ...
    next = wait.until(EC.element_to_be_clickable((By.XPATH,"//button[contains(@class,'next')]")))
    next.click() # Click on next link for 20 iterations.

Collectives™ on Stack Overflow

Unable to loop trough the links when crawling a website with Python and Selenium

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related