Python web scraping/ data extraction

Question

For my master thesis, I am exploring the possibility to extract data from a website via web automation. The steps are as follows:

Sign in to the website ( https://www.metal.com/Copper/201102250376 )
Input username and password
Click sign-in
Change date to 01/01/2020
Scrape the table data generated and then save it to csv file
Save to a specific folder with a specific name on my PC
Run the same sequence to download additional historical price data for other materials in a new tab in the same browser window

I am stuck in steps 5, 6 and 7

from selenium import webdriver

DRIVER_PATH = 'C:\webdriver\chromedriver.exe' driver = webdriver.Chrome(executable_path=DRIVER_PATH, chrome_options=ChromeOptions)

driver.maximize_window()

driver.get('https://www.metal.com/Copper/201102250376')

#Login steps LoginClick1 = driver.find_element_by_css_selector( '#__next > div > div.smm-component-header-en > div.main > div.right > button.button.sign-in')

LoginClick1.click()

user_input = driver.find_element_by_id('user_name') user_input.send_keys('#####')

password_input = driver.find_element_by_id('password') password_input.send_keys('####')

Submit = driver.find_element_by_css_selector( 'body > div:nth-child(17) > div > div.ant-modal-wrap.ant-modal-centered.smm-component-sign-en > div > div.ant-modal-content > div > div > div > div.smm-component-sign-en-content > form > div:nth-child(3) > div > div > span > button')

Submit.click()

time.sleep(2)

#scroll down the point of interest in page driver.execute_script("window.scrollBy(0,1000)", "")

#change currency driver.find_element(By.XPATH,"//img[contains(@class,'icon___BUqam')]").click()

time.sleep(1)

#change date from datepicker

date_input = driver.find_element_by_xpath( '//*[@id="__next"]/div/div[5]/div1/div[7]/div1/div2/div1/span1/div/i')

date_input.click()

action = ActionChains(driver)

action.move_to_element(date_input).send_keys(Keys.BACKSPACE).send_keys( Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).perform()

action.move_to_element(date_input).send_keys("01/01/2020").perform() action.move_to_element(date_input).send_keys(Keys.ENTER).perform()

time.sleep(2)

I am stuck trying to scrape the data from the table generated and then save into a csv file using selenium. See HTML code below table generated

**May 27, 2022** **10,758.75-10,788.43** **10,773.59** **+97.94** **USD/mt**

Any help would be massively appreciated.

Download file using button press Download button

driver.find_element(By.XPATH,"//img[contains(@src,'https://static.metal.com/www.metal.com/4.1.161/static/images/price/download.png')]").click()

time.sleep(1)

driver.find_element(By.XPATH,"//img[contains(@src,'https://static.metal.com/www.metal.com/4.1.161/static/images/price/download_excel.png')]").click()

To save time since I have multiple files/data to download, I am also exploring the possibility of directly saving the file via the download button provided.

The problem I encounter is that I am not able to directly specify the filename I want it to be saved as.
Upon click, the download button opens a new tab and then closes within seconds to initialize the file download.
The file is then downloaded with a materialcode-today's date file naming format.

Have you any idea on how to go about this?

Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. — Community
– Community Bot, Commented May 27, 2022 at 1:43

Darshan Shah · Accepted Answer · 2022-05-28 11:26:28Z

1

The reason sign in button is not getting clicked is because the xpath //*[@id="__next"]/div/div[3]/div[2]/div[2]/button[2] is incorrect the id of next is the main container div through which we are naviagting to the sign button by providing remaining html nodre structure

Instead you can directly select the sign in button as //button[@class='button sign-in'] based on its class value

Your solution for sign in would look like

driver = webdriver.Chrome(executable_path='C:\webdrivers\chromedriver.exe')
driver.maximize_window()
driver.get('https://www.metal.com/Nickel/201102250239')
# Click on Sign In
driver.find_element(By.XPATH, "//button[@class='button sign-in']").click()
# Enter username
driver.find_element(By.ID, "user_name").send_keys("your username")
# Enter password
driver.find_element(By.ID, "password").send_keys("your password") 
# Click Sign In
driver.find_element(By.XPATH, "//button[@type='submit']").click()

To scrape data

for element in driver.find_elements_by_class_name("historyBodyRow___1Bk9u"):
 elements =element.find_elements_by_tag_name("div")
 print("Date="+ elements[0].text)
 print("Price Range="+ elements[1].text)
 print("Avg="+ elements[2].text)
 print("Change="+ elements[3].text)
 print("Unit="+ elements[4].text)

Add To CSV

import csv
f = open('Path where you want to store the file', 'w')
writer = csv.writer(f)
for element in driver.find_elements_by_class_name("historyBodyRow___1Bk9u"):
  elements =element.find_elements_by_tag_name("div")
  entry = [elements[0].text ,elements[1].text ,elements[2].text , elements[3].text, elements[4].text]
  writer.writerow(entry)

f.close

edited May 28, 2022 at 11:26

answered May 28, 2022 at 7:57

Darshan Shah

3461 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Esclass Over a year ago

Thanks, that worked. However, I encountered a new problem trying to scrape the generated table data. I do not know how to go about it. stackoverflow.com/q/72399631/14434657

Esclass Over a year ago

Here's the HTML code: <div class="historyBodyRow___1Bk9u"><div class="" style="padding-left: 0px; flex: 1 1 0%; width: auto; text-align: left;">May 27, 2022</div><div class="" style="padding-left: 6px; width: 30%; text-align: right;">10,758.75-10,788.43</div><div class="" style="padding-left: 6px; width: 20%; text-align: right;">10,773.59</div><div class="up___11LCm" style="padding-left: 6px; width: 20%; text-align: right;">+97.94</div><div class="" style="padding-left: 6px; width: 15%; text-align: right;">USD/mt</div></div>

Darshan Shah Over a year ago

Hi @Esclass in order to scrape data you will have to loop through all div having class historyBodyRow___1Bk9u

Darshan Shah Over a year ago

Your solution would look like stackoverflow.com/a/72413918/18132195 Refer Scrape Data section

Darshan Shah Over a year ago

To stroe data into csv you can use the CSV library. Refer pythontutorial.net/python-basics/python-write-csv-file

|

Collectives™ on Stack Overflow

Python web scraping/ data extraction

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related