How to loop through a list of urls in python for web scraping

Question

Very new to python and struggling with this loop. I'm trying to pull the html attribute data address from a list of static pages that i already have in list format. I've managed to use BS4 to pull the data from one page but I cannot get the loop correct to iterate through my list of URLs. Right now I am receiving this error (Invalid URL '0': No schema supplied. Perhaps you meant http://0?) but I checked the URLs in single pulls and they all work. Here is my working single pull code:

import requests
from bs4 import BeautifulSoup

result = requests.get('https://www.coingecko.com/en/coins/0xcharts')
src = result.content
soup = BeautifulSoup(src, 'lxml')

contract_address = soup.find(
    'i', attrs={'data-title': 'Click to copy'})

print(contract_address.attrs['data-address'])

This is the loop I am working on:

import requests
from bs4 import BeautifulSoup

url_list = ['https://www.coingecko.com/en/coins/2goshi','https://www.coingecko.com/en/coins/0xcharts']

for link in range(len(url_list)):
    result = requests.get(link)
    src = result.content
    soup = BeautifulSoup(src, 'lxml')

    contract_address = soup.find(
    'i', attrs={'data-title': 'Click to copy'})

    print(contract_address.attrs['data-address'])

url_list.seek(0)

Bahae El Hmimdi · Accepted Answer · 2021-06-22 00:37:55Z

1

Try that.

import requests
from bs4 import BeautifulSoup

url_list = ['https://www.coingecko.com/en/coins/2goshi','https://www.coingecko.com/en/coins/0xcharts']

for link in url_list:
    result = requests.get(link)
    src = result.content
    soup = BeautifulSoup(src, 'lxml')

    contract_address = soup.find(
    'i', attrs={'data-title': 'Click to copy'})

    print(contract_address.attrs['data-address'])

url_list.seek(0)

answered Jun 22, 2021 at 0:37

Bahae El Hmimdi

3781 silver badge7 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ak120908 Over a year ago

Thank you! This worked, appreciate the help.

MendelG · Accepted Answer · 2021-06-22 00:40:21Z

1

You have misunderstood the usage of range(). Please read the docs.

When you do:

result = requests.get(link)

link is a an int value coming from range(), see what happens when you print(link). Instead, access the list url_list as follows:

result = requests.get(url_list[link])

Here's a full example:

import requests
from bs4 import BeautifulSoup

url_list = ['https://www.coingecko.com/en/coins/2goshi','https://www.coingecko.com/en/coins/0xcharts']

for link in range(len(url_list)):


    result = requests.get(url_list[link])
    src = result.content
    soup = BeautifulSoup(src, 'lxml')

    contract_address = soup.find(
    'i', attrs={'data-title': 'Click to copy'})

    print(contract_address.attrs['data-address'])

Output:

0x70e132641d6f1bd787b119a289fee544fbb2f316
0x86dd49963fe91f0e5bc95d171ff27ea996c0890c

answered Jun 22, 2021 at 0:40

community wiki

MendelG

1 Comment

ak120908 Over a year ago

Thanks for explaining that, this worked as well. Appreciate the help.

Collectives™ on Stack Overflow

How to loop through a list of urls in python for web scraping

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related