0

I am trying to get the url and sneaker titles at https://stockx.com/sneakers.

This is my code so far:

in main.py

from bs4 import BeautifulSoup
from utils import generate_request_header
import requests

url = "https://stockx.com/sneakers"
html = requests.get(url, headers=generate_request_header()).content
soup = BeautifulSoup(html, "lxml")

print soup

in utils.py

def generate_request_header():
    header = BASE_REQUEST_HEADER
    header["User-Agent"] = random.choice(USER_AGENT_HEADER_LIST)
    return header

But whenever I print soup, I get the following output: https://pastebin.com/Ua6B6241. There doesn't seem to be any HTML extracted. How would I get it? Should I be using something like Selenium?

2
  • Where is the code for BASE_REQUEST_HEADER and USER_AGENT_HEADER_LIST? are they inside the functions scope ? Commented Apr 8, 2017 at 9:11
  • Here: pastebin.com/E19rtbZy Commented Apr 8, 2017 at 9:12

2 Answers 2

1

requests doesn't seem to be able to verify the ssl certificates, to temporarily bypass this error, you can use verify=False, i.e.:

requests.get(url, headers=generate_request_header(), verify=False)

To fix it permanently, you may want to read:

http://docs.python-requests.org/en/master/user/advanced/#ssl-cert-verification

Sign up to request clarification or add additional context in comments.

Comments

1

I'm guessing the data you're looking for are at line 126 in the pastebin. I've never tried to extract the text of a script but I'm sure it could be done.

In lxml, something like: source_code.xpath('//script[@type="text/javascript"]') should return a list of all the scripts as objects.

Or to try and get straight to the "tickers":

[i for i in source_code.xpath('//script[@type="text/javascript"]') if 'tickers' in i.xpath('string')]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.