How to create a for loop when scraping multiple pages of a url?

Question

I want to be able to create a for loop that scrapes a url with multiple pages. There are examples of this I have found however my code requires authentication and for that reason I haven't shared the actual url. I have entered an example url that shows the same key identifier "currentPage=1"

So for this example for page i it would be currentPage=i where i will be 1,2,3,4....

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry


def requests_retry_session(retries=10,
                           backoff_factor=0.3,
                           status_forcelist=(500, 502, 503, 504),
                           session=None):

    session = session or requests.Session()

    retry = Retry(total=retries,
                  read=retries,
                  connect=retries,
                  backoff_factor=backoff_factor,
                  status_forcelist=status_forcelist)

    adapter = HTTPAdapter(max_retries=retry)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    return session


import io

import urllib3
import pandas as pd
from requests_kerberos import OPTIONAL, HTTPKerberosAuth

import web
a = web.get_mwinit_cookie()

urls = https://example-url.com/ABCD/customer.currentPage=1&end

def Scraper(url):
    urllib3.disable_warnings()
    with requests_retry_session() as req:
        resp = req.get(url,
                        timeout=30,
                        verify=False,
                        allow_redirects=True,
                        auth=HTTPAuth(mutual_authentication=OPTIONAL),
                        cookies=a)


    global df
    data = pd.read_html(resp.text, flavor=None, header=0, index_col=0)
    df = pd.concat(data, sort=False)
    print(df)

s = Scraper(urls)
df

Bigboss01 · Accepted Answer · 2021-10-12 20:12:19Z

1

pageCount = 4 #say you have 3 pages
urlsList = []
base = "https://example-url.com/ABCD/customer.currentPage={}&end" #curly braces let you format


for x in range(pageCount)[1:]:
    urlsList.append(base.format(x))

Then you can pass the list to your function.

answered Oct 12, 2021 at 20:12

Bigboss01

6281 gold badge7 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jean123 Over a year ago

So this does work in that it gives the right URLs however I get an error saying no connection adapters were found for"[urls then listed]. Which is odd as the urls listed can then be clicked on and go to the correct links

Jean123 Over a year ago

Using the above and then running through the function 1 by 1 from the list with s = [Scraper(urls) for urls in urlsList] works fine

Collectives™ on Stack Overflow

How to create a for loop when scraping multiple pages of a url?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related