0

Below is my code that I use for scraping the BSE web-site. All works fine, except for a minor glitch. The inner (second) for-loop doesn't iterate and the execution ends. Any help would be useful.

browser=webdriver.Chrome()
browser.get('http://www.bseindia.com/markets/keystatics/Keystat_index.aspx')
for i in range(1,48):
    browser.find_element_by_xpath("//*[@id='ctl00_ContentPlaceHolder1_ddltype']/option["+str(i)+"]").click()
    browser.find_element_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_btnSubmit"]').click()
    data = []
    for j in range(2,21):
        browser.find_element_by_xpath("//*[@id='ctl00_ContentPlaceHolder1_gvReport_ctl"+str(j).zfill(2)+"_Linkbtn']").click()
        for tr in browser.find_elements_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_gvYearwise"]'):
            ths = tr.find_elements_by_tag_name('th')
            tds = tr.find_elements_by_tag_name('td')
            if ths: 
                data.append([th.text for th in ths])
            if tds: 
                data.append([td.text for td in tds])
            f.write(str(data) + "\n")
14
  • 1
    Maybe browser.find_elements_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_gvYearwise"]') returns an empty list? Commented Dec 15, 2017 at 10:39
  • @suit no, im getting the first iterated list as result. Commented Dec 15, 2017 at 10:41
  • is ctl00_ContentPlaceHolder1_gvReport the table that you wanna scrape? Commented Dec 15, 2017 at 10:47
  • @skrubber: yes, that is the table Commented Dec 15, 2017 at 10:49
  • Then maybe first list is the last? Commented Dec 15, 2017 at 10:50

2 Answers 2

3

Many times click leads to 500 so i ran recursive try catch block.

Here is the whole code:

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
import time

base_url="http://www.bseindia.com/markets/keystatics/Keystat_index.aspx"
#browser = webdriver.Chrome('/Users/qriyoinfolabs/ahlat/chromedriver')
browser=webdriver.Chrome()
browser.get(base_url)
data = []


def fetch_this_erroful_page_for_me(id):
    try:
        print "Trying "+str(id)+"..."
        browser.find_element_by_xpath("//*[@id='ctl00_ContentPlaceHolder1_ddltype']/option["+str(id)+"]").click()
        browser.find_element_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_btnSubmit"]').click()
    except:
        print "Retrying "+str(id)+"..."
        time.sleep(2)
        browser.get(base_url)
        fetch_this_erroful_page_for_me(id)

def click_on_this_link_for_me(year_id,option_id):
    try:
        print "Trying year"+str(year_id)+"..."
        zfilled_id=str(year_id).zfill(2)
        browser.find_element_by_xpath("//*[@id='ctl00_ContentPlaceHolder1_gvReport_ctl"+zfilled_id+"_Linkbtn']").click()
        return 1
    except NoSuchElementException:
        return 0
    else:
        time.sleep(2)
        fetch_this_erroful_page_for_me(option_id)
        click_on_this_link_for_me(year_id,option_id)

for i in range(1,48):
    fetch_this_erroful_page_for_me(i)

    for j in range(2,21):

        valid=click_on_this_link_for_me(j,i)
        if(valid==0):
            print "valid0"
            break
        for tr in browser.find_elements_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_gvYearwise"]'):
            ths = tr.find_elements_by_tag_name('th')
            tds = tr.find_elements_by_tag_name('td')
            if ths:
                data.append([th.text for th in ths])
            if tds: 
                data.append([td.text for td in tds])


with open('str.txt','w') as file:
    file.write(str(data))
Sign up to request clarification or add additional context in comments.

Comments

0

'//[@id="ctl00_ContentPlaceHolder1_gvYearwise"]' is not a tr tag, this is a table. So browser.find_elements_by_xpath(..) returns just one element. Try '//[@id="ctl00_ContentPlaceHolder1_gvYearwise"]//tr'

Btw, this is realy bad practice to do like for i in range(1,48). Try to make some iterable objects with elements or element generator.

For example (I am no shure that works well, because didn't test it properly - there is HTTP ERROR 500 issue):

def get_next_row(driver, xpath):
    i = 0
    while True:
        try:
            yield driver.find_elements_by_xpath(xpath)[i]
        except IndexError:
            break
        i += 1

browser=webdriver.Chrome()
browser.implicitly_wait(0.5)
browser.get('http://www.bseindia.com/markets/keystatics/Keystat_index.aspx')

for list_item in get_next_row(browser, "//*[@id='ctl00_ContentPlaceHolder1_ddltype']/option"):
    list_item.click()
    browser.find_element_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_btnSubmit"]').click()
    data = []
    for next_button in get_next_row(browser, '//a[contains(@id, "ctl00_ContentPlaceHolder1_gvReport_ct")]'):
        next_button.click()
        for tr in get_next_row(browser, '//*[@id="ctl00_ContentPlaceHolder1_gvYearwise"]//tr'):
            ths = tr.find_elements_by_tag_name('th')
            tds = tr.find_elements_by_tag_name('td')
            if ths:
                data.append([th.text for th in ths])
            if tds:
                data.append([td.text for td in tds])
            f.write(str(data) + "\n")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.