1

I'm trying to scrape this website for my project to populate a list of insurance products available.

However, the website has an internal scrolling bar, that only displays the first 10 items onto the page, and would only bring new elements onto display when you scroll that internal bar downwards.

How do I

  • Use python Selenium to scroll that internal bar downwards? Can't seem to find much information of that around.
  • How do I use Selenium to retrieve the Company Name, Product Name, Paymode, product features (if active) and return a pandas Dataframe?

1 Answer 1

2

Interesting thing is, you don't need to scroll the container at all. All the results are actually loaded, but part of them are just invisible. You can simply find all li elements with result_content class and get the desired data.

Example working code extracting the "prod names":

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver


driver = webdriver.Chrome("/usr/local/bin/chromedriver")
driver.maximize_window()
driver.get("http://comparefirst.sg/wap/productsListEvent.action?prodGroup=whole&pageAction=prodlisting")

wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.ID, "result_container")))
results = driver.find_elements_by_css_selector("li.result_content")

for result in results:
    prod_name = result.find_element_by_id('sProdName').get_attribute("innerText")
    print(prod_name)

driver.close()

Prints:

AIA Gen3 (II)
AIA Guaranteed Protect Plus
AIA Guaranteed Protect Plus
...
DIRECT- TM Basic Whole Life
DIRECT- TM Basic Whole Life (+ Critical Illness)
TM Legacy
TM Legacy (+ Critical Illness)
TM Legacy LifeFlex
TM Legacy LifeFlex (+ Critical Illness)
TM Retirement GIO
TM Retirement PaycheckLife (Single Life)

Note that we have to use .get_attribute("innerText") instead of .text since the latter would return the visible text only while most of our elements are invisible.

Sign up to request clarification or add additional context in comments.

3 Comments

thanks for the quick response! This seems to work wonderfully. But, the name of the company looks to be within <h3 COMPANY NAME </h3> tags. any idea how I can retrieve this as well? Also, any thoughts about how if the product features pictures is active, how do I pick out that information?
@jakewong you should be able to locate other fields inside every result using the result.find_element_by_*-like methods. E.g. to get the h3 element: result.find_element_by_tag_name("h3").get_attribute("innerText").
Oh, I didn't know you can do that. Thanks. I'll check it out and play around with it. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.