Python Get Value of Javascript Variable

Question

I'm scraping instagram page (https://instagram.com/celmirashop) and get script (HTML and some javascript). the result like this

<script>some script</script>
<script>some script</script>
<script>some script</script>
<script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>

I have creating script like this

import urllib.request
import json
import re
from bs4 import BeautifulSoup

web = urllib.request.urlopen("https://instagram.com/celmirashop")
soup = BeautifulSoup(web.read(), 'lxml')
pattern = re.compile(r"window._sharedData = .")
script = soup.find("script",text=pattern)
print(script)

and giving me a result a specific javascript that I want to. like this

<script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>

How can I get the value of window._sharedData ? and loop it. because I want save in mysql

QHarr · Accepted Answer · 2019-10-24 02:30:11Z

2

Assuming ends with ; and occurs only once you can use the following regex pattern on the response.text

import re

s = '''<script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null"};</script>'''
p = re.compile(r'window\._sharedData = (.*);')
print(p.findall(s)[0])

answered Oct 24, 2019 at 2:30

QHarr

84.5k14 gold badges58 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jmunsch · Accepted Answer · 2019-10-24 02:55:48Z

2

Here is a way:

>>> xxx = '''
... <script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>
... '''

>>> xxx.split('"csrf_token":"')
['\n<script>window._sharedData = {"config":{', 'sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>\n']

>>> xxx.split('"csrf_token":"')[1].split('"')[0]
'sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu'

Just note that BS, doesn't actually run any javascript, so the script tags, or any other javascript isn't actually being run.

You'll have to use something like selenium in order to do something more with it.

If you do go with selenium you can, do something like:

import json
import selenium.webdriver

options = selenium.webdriver.FirefoxOptions()
options.add_argument("--headless")

driver = selenium.webdriver.Firefox(firefox_options=options)

driver.get('https://instagram.com/celmirashop')

# note this assumes there is no circular data, etc in the thing 
# passed to`JSON.stringify`

# run this javascript in the firefox browser
js = "return JSON.stringify(window._sharedData)"

# load the hopefully stringified json to python 
hello = json.loads(driver.execute_script(js))

for k, v in hello.items():
    print(k, v)

edited Oct 24, 2019 at 2:55

answered Oct 24, 2019 at 2:41

jmunsch

24.3k12 gold badges102 silver badges120 bronze badges

1 Comment

QHarr Over a year ago

return JSON.stringify(window._sharedData) nice +

Collectives™ on Stack Overflow

Python Get Value of Javascript Variable

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related