Scraping website in which html is injected with javascript

Question

I am trying to get the url and sneaker titles at https://stockx.com/sneakers.

This is my code so far:

in main.py

from bs4 import BeautifulSoup
from utils import generate_request_header
import requests

url = "https://stockx.com/sneakers"
html = requests.get(url, headers=generate_request_header()).content
soup = BeautifulSoup(html, "lxml")

print soup

in utils.py

def generate_request_header():
    header = BASE_REQUEST_HEADER
    header["User-Agent"] = random.choice(USER_AGENT_HEADER_LIST)
    return header

But whenever I print soup, I get the following output: https://pastebin.com/Ua6B6241. There doesn't seem to be any HTML extracted. How would I get it? Should I be using something like Selenium?

Where is the code for BASE_REQUEST_HEADER and USER_AGENT_HEADER_LIST? are they inside the functions scope ? — Pedro Lobito
– Pedro Lobito, Commented Apr 8, 2017 at 9:11

Pedro Lobito · Accepted Answer · 2017-04-08 09:17:41Z

1

requests doesn't seem to be able to verify the ssl certificates, to temporarily bypass this error, you can use verify=False, i.e.:

requests.get(url, headers=generate_request_header(), verify=False)

To fix it permanently, you may want to read:

http://docs.python-requests.org/en/master/user/advanced/#ssl-cert-verification

answered Apr 8, 2017 at 9:17

Pedro Lobito

99.8k36 gold badges274 silver badges278 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

AutomaticStatic · Accepted Answer · 2017-04-09 00:52:23Z

1

I'm guessing the data you're looking for are at line 126 in the pastebin. I've never tried to extract the text of a script but I'm sure it could be done.

In lxml, something like: source_code.xpath('//script[@type="text/javascript"]') should return a list of all the scripts as objects.

Or to try and get straight to the "tickers":

[i for i in source_code.xpath('//script[@type="text/javascript"]') if 'tickers' in i.xpath('string')]

answered Apr 9, 2017 at 0:52

AutomaticStatic

1,7595 gold badges23 silver badges45 bronze badges

Collectives™ on Stack Overflow

Scraping website in which html is injected with javascript

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related