1

I am trying to parse content within JavaScript. I have an idea of how to do it, but I am not entirely sure. I have read up on some examples, and I am thinking that using the re library might be the way to go.

Here is my code so far:

import requests
import json
import re
from bs4 import BeautifulSoup

url = r'https://login.live.com/login.srf?wa=wsignin1.0&rpsnv=13&rver=6.7.6643.0&wp=MBI_SSL&wreply=https:%2f%2faccount.xbox.com%2fen-us%2faccountcreation%3freturnUrl%3dhttps:%252f%252fwww.xbox.com:443%252fen-US%252f%26pcexp%3dtrue%26uictx%3dme%26rtc%3d1&lc=1033&id=292543&aadredir=1'


s = requests.Session()


soup = BeautifulSoup(s.get(url).content, 'html.parser')


print(soup.find_all("script", type="text/javascript")[5].prettify())

Here is just a snippet of the parsed content. I am trying to get access to this data, particularly "value"

<input type="hidden" name="PPFT" id="i0327" value="Dd**Lkp2L3EKDvGi3u6PEweEQUhvW*1jPrA3FgGSdeYoY8FERluiTqDef6QF3V5NkN*4yPg7vvxI3jo5oKPRelhfU3rYGFkxbxyvSBssiwFA!8LwocAbVDtrDq11Wk3F4LzRBQck3H4ca5r3Qhv8b0h4CxcEZgAnGAkcWE7fExGn1dBwGoY8sZVL2!ZBMjnJEanidLF!Yi975frkQ6Cys2oUb863xoLxdvZGuLQRxRLjjKubaCHlWQbD0b*Wzq49EA$$"/>

I appreciate all responses in advance. Thanks!

1 Answer 1

2
from bs4 import BeautifulSoup as bs
import requests
import re
url = 'https://login.live.com/login.srf?wa=wsignin1.0&rpsnv=13&rver=6.7.6643.0&wp=MBI_SSL&wreply=https:%2f%2faccount.xbox.com%2fen-us%2faccountcreation%3freturnUrl%3dhttps:%252f%252fwww.xbox.com:443%252fen-US%252f%26pcexp%3dtrue%26uictx%3dme%26rtc%3d1&lc=1033&id=292543&aadredir=1'
page = requests.get(url)
html = bs(page.text, 'lxml')
input = html.findAll('script', type="text/javascript")[5].prettify()
value = re.findall(r'value=".+"/', input)
#value = str(value).replace('value="', '').replace('"/','')
value = str(value).replace('value="', '').replace('"/','').replace("['",'').replace("']",'')
print(value)
Output:
DVSXQahhtomXS2Y4k2itS5MPP52mJgUkC7LH!W*1DmjHiWk*npajBfgXK5yp3*!bu3Wuvvs7xavleUV3nIbjLZHckj73QMe8wipwXhCqpXuUZQ2wnJvNYAVNCg9XxKPuIovp7!sLbumrufuYefyzM6UQLkMb5c7MuImDofVhLlKxpI7Pohe8sO2x8r63TtFCTDphWzqXKJE3B8DRK*AhMbFsmdP0sj2CXMZ7dyTfLJSr1zWBlaHTqJPLvhgzLSiaEg$$
Sign up to request clarification or add additional context in comments.

4 Comments

When I run this, the output is ['DVSXQahhtomXS2Y4k2itS5MPP52mJgUkC7LH!W*1DmjHiWknpajBfgXK5yp3!bu3Wuvvs7xavleUV3nIbjLZHckj73QMe8wipwXhCqpXuUZQ2wnJvNYAVNCg9XxKPuIovp7!sLbumrufuYefyzM6UQLkMb5c7MuImDofVhLlKxpI7Pohe8sO2x8r63TtFCTDphWzqXKJE3B8DRK*AhMbFsmdP0sj2CXMZ7dyTfLJSr1zWBlaHTqJPLvhgzLSiaEg$$'] How can I remove the [' '] ?
It's dynamic content so it will constantly change. Also, just replace the value with the new data I just edited my code with.
I understand that, but I just want the string of characters, with out the [' at the beginning and the '] at the end
Worked perfect:) Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.