0

I wrote a python code for web scraping so that I can import the data from flipkart.
I need to load multiple pages so that I can import many products but right now only 1 product page is coming.

from urllib.request import urlopen as uReq
from requests import get
from bs4 import BeautifulSoup as soup
import tablib 


my_url = 'https://www.xxxxxx.com/food-processors/pr?sid=j9e%2Cm38%2Crj3&page=1'

uClient2 = uReq(my_url)
page_html = uClient2.read()
uClient2.close()

page_soup = soup(page_html, "html.parser")

containers11 = page_soup.findAll("div",{"class":"_3O0U0u"}) 

filename = "FoodProcessor.csv"
f = open(filename, "w", encoding='utf-8-sig')
headers = "Product, Price, Description \n"
f.write(headers)

for container in containers11:
    title_container = container.findAll("div",{"class":"_3wU53n"})
    product_name = title_container[0].text

    price_con = container.findAll("div",{"class":"_1vC4OE _2rQ-NK"})
    price = price_con[0].text



    description_container = container.findAll("ul",{"class":"vFw0gD"})
    product_description = description_container[0].text


    print("Product: " + product_name)
    print("Price: " + price)
    print("Description" + product_description)
    f.write(product_name + "," + price.replace(",","") +"," + product_description +"\n")

f.close()
0

4 Answers 4

1

You have to check if the next page button exist or not. If yes then return True, go to that next page and start scraping if no then return False and move to the next container. Check for the class name of that button first.

# to check if a pagination exists on the page:
    
    def go_next_page():
        try:
            button = driver.find_element_by_xpath('//a[@class="<class name>"]')
            return True, button
        except NoSuchElementException:
            return False, None
Sign up to request clarification or add additional context in comments.

1 Comment

I dont know how to apply can you please explain in brief with my code
0

You can Firstly get the number of pages available and iterate over for each of the pages and parse the data respectively.

Like if you change the URL with respect to page

  • 'https://www.flipkart.com/food-processors/pr?sid=j9e%2Cm38%2Crj3&page=1' which points to page 1
  • 'https://www.flipkart.com/food-processors/pr?sid=j9e%2Cm38%2Crj3&page=2' which points to page 2

Comments

0
try:
        next_btn = driver.find_element_by_xpath("//a//span[text()='Next']")
        next_btn.click()
    except ElementClickInterceptedException as ec: 
        classes = "_3ighFh"
        overlay = driver.find_element_by_xpath("(//div[@class='{}'])[last()]".format(classes))
        driver.execute_script("arguments[0].style.visibility = 'hidden'",overlay)
        next_btn = driver.find_element_by_xpath("//a//span[text()='Next']")
        next_btn.click()
    
    except Exception as e:
        print(str(e.msg()))
        break
except TimeoutException:
    print("Page Timed Out")

driver.quit()

Comments

0

For me, the easiest way is to add an extra loop with the "page" variable:

# just check the number of the last page on the website
page = 1

while page != 10:
     print(f'Scraping page: {page}')
     my_url = 'https://www.xxxxxx.com/food-processors/pr?sid=j9e%2Cm38%2Crj3&page={page}'

     # here add the for loop you already have


     page += 1

This method should work.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.