0

I am trying to scrape data from this dynamic JavaScript website. Since the page is dynamic I am using Selenium to extract the data from the table. Please suggest me how to scrape the data from the dynamic table. Here is my code.

import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import pandas as pd
import lxml.html as LH
import requests

# specify the url
urlpage = 'http://www.sotaventogalicia.com/en/real-time-data/historical'
print(urlpage)

# run firefox webdriver from executable path of your choice
driver = webdriver.Chrome('C:/Users/Shresth Suman/Downloads/chromedriver_win32/chromedriver.exe')
##driver = webdriver.Firefox(executable_path = 'C:/Users/Shresth Suman/Downloads/geckodriver-v0.26.0-win64/geckodriver.exe')

# get web page
driver.get(urlpage)
# execute script to scroll down the page
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
# sleep for 5s
time.sleep(5)
# driver.quit()


# find elements by xpath
##results = driver.find_elements_by_xpath("//div[@id='div_taboa']//table[@id='taboa']/tbody")
##results = driver.find_elements_by_xpath("//*[@id='page-title']")
##results = driver.find_elements_by_xpath("//*[@id='div_main']/h2[1]")
results = driver.find_elements_by_xpath("//*[@id = 'frame_historicos']")
print(results)
print(len(results))


# create empty array to store data
data = []
# loop over results
for result in results:
    heading = result.text
    print(heading)
    headingfind = result.find_element_by_tag_name('h1')
    # append dict to array
    data.append({"head" : headingfind, "name" : heading})
# close driver 
driver.quit()
###################################################################



# save to pandas dataframe
df = pd.DataFrame(data)
print(df)
# write to csv
df.to_csv('testsot.csv')

I want to extract data from 2005 till present with Averages/Totals of 10 min which gives me data for only one month.

3
  • Please mentioned your expected output? Commented Nov 12, 2019 at 14:03
  • @KunduK The expected output is to store all the data from the table to a CSV file. Commented Nov 12, 2019 at 14:45
  • You need to do lot of things.Create list for start_date and end_date since date range allow for one month only.I have done some working code for you however you have to create those list and iterate. Commented Nov 12, 2019 at 16:23

1 Answer 1

1
  1. Induce WebDriverWait And element_to_be_clickable()
  2. Install Beautiful soup library
  3. Using pandas read_html()
  4. I haven't create list. you should create startdate and enddate list and itearte for all those month since 1/1/2005

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    import pandas as pd
    from bs4 import BeautifulSoup
    import time
    urlpage = 'http://www.sotaventogalicia.com/en/real-time-data/historical'
    driver = webdriver.Chrome('C:/Users/Shresth Suman/Downloads/chromedriver_win32/chromedriver.exe')
    driver.get(urlpage)
    WebDriverWait(driver,20).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"frame_historicos")))
    inputstartdate=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"(//input[@class='dijitReset dijitInputInner'])[1]")))
    inputstartdate.clear()
    inputstartdate.send_keys("1/1/2005")
    inputenddate=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"(//input[@class='dijitReset dijitInputInner'])[last()]")))
    inputenddate.clear()
    inputenddate.send_keys("1/31/2005")
    WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//input[@class='form-submit'][@value='REFRESH']"))).click()
    WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table#taboa")))
    time.sleep(3)
    soup=BeautifulSoup(driver.page_source,"html.parser")
    table=soup.find("table", id="taboa")
    df=pd.read_html(str(table))
    df.to_csv('testsot.csv')
    print(df)
    
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your reply. This works for a particular month. I will try to iterate for the whole period now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.