0

I try to scrape https://www.anytimemailbox.com/s/new-york-42-broadway. I checked https://stackoverflow.com/a/61343018/21294350 and used driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") for my special case.

minimal demo:

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC

selector1_str_main='div[class="t-disc"]'
selector1_str=selector1_str_main+'>a'
selector1=(By.CSS_SELECTOR, selector1_str)

driver=webdriver.Firefox()
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # script1
waiter=WebDriverWait(driver, 20)
# assert '<a href="#" onclick="thShowFullServicePlan11(4967);return false;">' in driver.page_source
# This doesn't throw error. With this addition, the delay will make the scroll work to make click available.
waiter.until(EC.visibility_of_element_located((By.CSS_SELECTOR,".policy-wrapper")))
waiter.until(EC.visibility_of_all_elements_located(selector1))
waiter.until(EC.element_to_be_clickable(selector1).click()
# throw "ERROR:Message: Element <a href="#"> could not be scrolled into view"

This is probably due to script1 is not finished before running the latter click. I have tried https://stackoverflow.com/a/65844911/21294350 by waiter.until(EC.visibility_of_element_located((By.CSS_SELECTOR,".policy-wrapper"))) but it doesn't work.

Currently the workaround is to just use JS which doesn't need the clickable item inside the view https://stackoverflow.com/a/55431861/21294350.

But I still wonders why the above untils seem to fail to work. What's the problem for that?

3
  • You can also use this to scroll a particular amount instead of moving frame to the element which sometimes doesn't work. driver.execute_script("window.scrollTo(0, 1000);") You can change the 1000 value to 500 or 1500 depending on how much you want to scroll down. Commented Jun 20 at 11:11
  • @ManrajSinghDhillon Thanks. It works. Do you know why until(s) above fail? Commented Jun 20 at 11:14
  • @ManrajSinghDhillon But it sometimes fails. I don't know how selenium-webdriver manipulates that script. It uses W3C_EXECUTE_SCRIPT to call w3c.github.io/webdriver to run that. My firefox works fine for those JavaScript with the default Gecko/20100101 firefox-source-docs.mozilla.org/testing/geckodriver/…. Also I use execute_script which is synchronous. But that probably only ensures the action is issued and we need until to check whether it is finished. Commented Jun 20 at 11:53

1 Answer 1

2

Some feedback...

  1. Maximize the browser so you get consistent results.
  2. You don't need to scroll, the page isn't that big.
  3. You don't need to concatenate strings to make the locator.
  4. You don't need to wait for one element to be clickable and then another if you aren't going to click the first.

Refactoring using the feedback above, we get...

from selenium import webdriver

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

url = 'https://www.anytimemailbox.com/s/new-york-42-broadway'
driver = webdriver.Firefox()
driver.maximize_window()
driver.get(url)

wait = WebDriverWait(driver, 10)

details = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.t-disc > a")))
for detail in details:
    wait.until(EC.element_to_be_clickable(detail)).click()
    prices = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.t-detailed td.t-w50")))
    print([price.text.split("\n")[0] for price in prices])
    wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "i.fa-times"))).click()

which prints...

['US$ 19.99 / month', 'US$ 200.99 / year']
['US$ 29.99 / month', 'US$ 300.99 / year']
['US$ 44.99 / month', 'US$ 529.99 / year']

This is just an example of printing some text from the opened page. This should be enough for you to get started on whatever your task is.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks. 0. For me, driver.maximize_window() is enough to make the codes in the question post always work. 1. Do you mean by point 4 for EC.visibility_of_all_elements_located(selector1)? Fine. My original purpose is to ensure the page is loaded well. It is also reasonable to just wait for the item which will be clicked later to be visible. 2. I still want to know why waiter.until(EC.visibility_of_element_located((By.CSS_SELECTOR,".policy-wrapper"))) doesn't work. Do you know about that? 3. Later I found selenium is much slower than splash, so I changed to the latter. Cont.
I also planned to just manually send one request to simulate the JavaScript behaviour as stackoverflow.com/a/8594831/21294350 shows. That is probably faster than both of the former ones.
@An5Drama wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,".policy-wrapper"))) works fine for me. In my code, I added it right after wait is defined and it worked.
If this or any other answer was useful please upvote it. Once you find the answer to your question, please mark it as accepted so the question isn't left unanswered.
0. I upvoted for you. The 2nd upvote is done by me. 1. You use "div.t-disc > a" which works when using maximization (at least for my default setting, those selected elements are shown). In my codes, I scrolled to the bottom and waited for the bottom element with class "policy-wrapper" to show. That is a bit different.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.