2

I would like to create a pandas dataframe that includes all rows fulfilling the condition(and I managed to do it )scraped from a multiple page website .But the final result is that I am getting the pandas dataframe that has only the rows which belong to the last page of range i declared in the loop . I would be extremely grateful if someone pointed out where the error is that instead of the result from all pages , only the last one i get.

import requests
import pandas
from bs4 import BeautifulSoup

headers= {'User-Agent': 'Mozilla/5.0'}


for num in range (1,3):
    url =' https://biznes.interia.pl/gieldy/notowania-gpw/profil-akcji-grn,wId,7380,tab,przebieg-sesji,pack,{}'.format(num)
     

    response = requests.get(url,headers=headers)
    content = response.content
    soup = BeautifulSoup(content,"html.parser")

    notow = soup.find_all('table',class_ = 'business-table-trading-table')
    #on a given page, select only the rows containing the word "Transakcja" 
    rows = notow[0].select('tr:has(td:contains("TRANSAKCJA"))')
     
    data = []
    
    for row in rows :
        cols = row.find_all('td')
         
        cols = [ele.text.strip() for ele in cols]
         
        cols = data.append([ele for ele in cols if ele] )
        
         
 #final dataframe which should have  contained  the result from  all scraped pages        
        
df = pandas.DataFrame(data,)      
                      
print(df)

1 Answer 1

3

Put the code data = [] outside of the loop.

The items extracted into the list data are now re-initialized to empty list in the last loop iteration, effectively erasing all extracted in the first 2 loop iterations.

In general, avoid initialization of variables inside a loop unless you use the variables only within the loop.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank You for fast hint
Welcome! Bear in mind to avoid initialization of variables inside a loop unless you use the variables only within the loop. I have edited the answer to include this general guideline.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.