final dataframe from web-scraping multiple pages

Question

I would like to create a pandas dataframe that includes all rows fulfilling the condition(and I managed to do it )scraped from a multiple page website .But the final result is that I am getting the pandas dataframe that has only the rows which belong to the last page of range i declared in the loop . I would be extremely grateful if someone pointed out where the error is that instead of the result from all pages , only the last one i get.

import requests
import pandas
from bs4 import BeautifulSoup

headers= {'User-Agent': 'Mozilla/5.0'}


for num in range (1,3):
    url =' https://biznes.interia.pl/gieldy/notowania-gpw/profil-akcji-grn,wId,7380,tab,przebieg-sesji,pack,{}'.format(num)
     

    response = requests.get(url,headers=headers)
    content = response.content
    soup = BeautifulSoup(content,"html.parser")

    notow = soup.find_all('table',class_ = 'business-table-trading-table')
    #on a given page, select only the rows containing the word "Transakcja" 
    rows = notow[0].select('tr:has(td:contains("TRANSAKCJA"))')
     
    data = []
    
    for row in rows :
        cols = row.find_all('td')
         
        cols = [ele.text.strip() for ele in cols]
         
        cols = data.append([ele for ele in cols if ele] )
        
         
 #final dataframe which should have  contained  the result from  all scraped pages        
        
df = pandas.DataFrame(data,)      
                      
print(df)

SeaBean · Accepted Answer · 2021-02-18 05:45:02Z

3

Put the code data = [] outside of the loop.

The items extracted into the list data are now re-initialized to empty list in the last loop iteration, effectively erasing all extracted in the first 2 loop iterations.

In general, avoid initialization of variables inside a loop unless you use the variables only within the loop.

edited Feb 18, 2021 at 5:45

answered Feb 17, 2021 at 21:16

SeaBean

23.4k3 gold badges16 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Maciek Paciarski Over a year ago

Thank You for fast hint

SeaBean Over a year ago

Welcome! Bear in mind to avoid initialization of variables inside a loop unless you use the variables only within the loop. I have edited the answer to include this general guideline.

Collectives™ on Stack Overflow

final dataframe from web-scraping multiple pages

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related