0

I'm trying to scrape some data from bvc.com.co (the Colombian Stock Exchange webpage). But always, when loading the third stock, the screen comes blank and the target expected_condition can not be executed (maybe because the page is not shown). Here is my code:

stocks = ['https://www.bvc.com.co/renta-variable-mercado-local/cibest?tab=operaciones',
           'https://www.bvc.com.co/renta-variable-mercado-local/pfcibest?tab=operaciones',
           'https://www.bvc.com.co/renta-variable-mercado-local/bogota?tab=operaciones',
           'https://www.bvc.com.co/renta-variable-mercado-local/bhi?tab=operaciones',
           'https://www.bvc.com.co/renta-variable-mercado-local/celsia?tab=operaciones']

import selenium, time
import selenium.webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

driver = selenium.webdriver.Chrome()

for i in stocks:
    print(i)
    #driver = selenium.webdriver.Chrome()
    driver.get(i)
    time.sleep(1)

    target = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//*[@id="__next"]/div/div[3]/div[3]/div/div[1]/ul/li[3]')))
    driver.execute_script("arguments[0].scrollIntoView()", target)
    time.sleep(1)

A solution (but not the best) is initializing the driver into the loop. However, this makes the Chrome app open and close for each stock, which makes the code take a longer time to finish.

PD: the options.add_argument('--disable-blink-features=AutomationControlled') does not fix the problem.

2
  • You don't show your code where you're actually doing something with the stocks that comes back blank. Please update your code with an actual minimal reproducible example that reproduces the issue. Commented Jun 5 at 20:36
  • Hi @JeffC. The problem is that any WebDriverWait(driver, wait).until(EC.presence_of_element_located((fool, bar))) will fail due to the blank page. Therefore, it will be impossible to do (scrap) anything until the page displays normally again. But what I want is to extract the data visible in the table (target). Commented Jun 5 at 20:52

1 Answer 1

1

The main issue is that it looks like the site has some sort of Selenium/bot protection. It's loading a blank page after 1-2 pages that can't be refreshed, etc. If this is true, I would respect their wishes and not scrape this site. If this isn't true, I've updated your code with the feedback below.

If you are just looking for a site to practice automation, I would do some googling. Here's a couple that I've found over the years but there are many, many more...


Some feedback:

  1. The item you are closing is not a frame, it's a popup. I updated the variable names.

  2. I changed useless_frame to dismiss_popup and reversed the values because I think it makes the intent clearer.

  3. Rather than reinstantiating a new WebDriverWait each time you use it, it's better to assign it to a variable and then reuse the variable, e.g.

    WebDriverWait(driver, 10).until(...)
    WebDriverWait(driver, 10).until(...)
    

    becomes

    wait = WebDriverWait(driver, 10)
    wait.until(...)
    wait.until(...)
    
  4. In Selenium terms, presence means that the element is in the DOM, not that it's necessarily ready to be interacted with. If you are going to click an element, you should use EC.element_to_be_clickable(). If you are going to get text, values, etc. then the element must be visible to avoid errors so you should use EC.visibility_of_element_located().

  5. You can't check if bvc_frame because if that element doesn't exist, your wait will throw a TimeoutException so that check can be removed.

  6. You don't need to click on the "Operaciones" tab because your URL contains ?tab=operaciones which already navigates to the page with that tab selected so that code can be removed.

  7. time.sleep() should be avoided. It's considered a "dumb" sleep. It always waits X seconds even if the element is available sooner. The best practice is to use WebDriverWait, which you're already using elsewhere.

  8. Instead of listing the entire URL, you can just provide the stock name and insert it into the URL because the rest is the same, e.g.

    stocks = ['https://www.bvc.com.co/renta-variable-mercado-local/cibest?tab=operaciones', '...']
    

    becomes

    stocks = ['cibest', 'pfcibest', '...']
    ...
    for stock in stocks:
        driver.get(f'https://www.bvc.com.co/renta-variable-mercado-local/{stock}?tab=operaciones')
    
  9. If you are going to be scraping more than just a few of these, it would be better/much faster to run these in parallel.


Updating your code with the feedback above,

import selenium
import selenium.webdriver

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

driver = selenium.webdriver.Chrome()
stocks = ['cibest',
        'pfcibest',
        'bogota',
        'bhi',
        'celsia']

wait = WebDriverWait(driver, 20) # page is slower for me than 10s
dismiss_popup = True

for stock in stocks:
    print(stock)
    driver.get(f'https://www.bvc.com.co/renta-variable-mercado-local/{stock}?tab=operaciones')

    if dismiss_popup: # closing popup
        wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '.sc-843139d2-14.iwukQD'))).click()
        dismiss_popup = False

    contado_table = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#accordion__panel-Contado table")))
    rows = contado_table.find_elements(By.CSS_SELECTOR, "tr")
    for row in rows:
        # do something with each table row
        print(row.text)

If you want to go really fast, you can separate each stock into a different run with it's own browser and run them in parallel. You convert this script into a data driven test and feed the stocks into the test, e.g.

import pytest

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

@pytest.mark.parametrize("stock",
        ['cibest',
        'pfcibest',
        'bogota',
        'bhi',
        'celsia'])

def test(stock):
    driver = webdriver.Chrome()
    driver.maximize_window()
    wait = WebDriverWait(driver, 20) # page is slower for me than 10s
    
    print(stock)
    driver.get(f'https://www.bvc.com.co/renta-variable-mercado-local/{stock}?tab=operaciones')

    # closing popup
    wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '.sc-843139d2-14.iwukQD'))).click()

    contado_table = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#accordion__panel-Contado table")))
    rows = contado_table.find_elements(By.CSS_SELECTOR, "tr")
    for row in rows:
        # do something with each table row
        print(row.text)

You'll need to look up a tutorial on how to configure/install pytest and run tests in parallel, etc. but that will be your fastest option.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.