4

I'm trying to get a JavaScript var value from an HTML source code using BeautifulSoup.

For example I have:

<script>
[other code]
var my = 'hello';
var name = 'hi';
var is = 'halo';
[other code]
</script>

I want something to return the value of the var "my" in Python

How can I achieve that?

4 Answers 4

5

The simplest approach is to use a regular expression pattern to both locate the element via BeautifulSoup and extract the desired substring:

import re

from bs4 import BeautifulSoup

data = """
<script>
[other code]
var my = 'hello';
var name = 'hi';
var is = 'halo';
[other code]
</script>
"""

soup = BeautifulSoup(data, "html.parser")

pattern = re.compile(r"var my = '(.*?)';$", re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)

print(pattern.search(script.text).group(1))

Prints hello.

Sign up to request clarification or add additional context in comments.

Comments

3

Another idea would be to use a JavaScript parser and locate a variable declaration node, check the identifier to be of a desired value and extract the initializer. Example using slimit parser:

from bs4 import BeautifulSoup
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor


data = """
<script>
var my = 'hello';
var name = 'hi';
var is = 'halo';
</script>
"""

soup = BeautifulSoup(data, "html.parser")

script = soup.find("script", text=lambda text: text and "var my" in text)

# parse js
parser = Parser()
tree = parser.parse(script.text)
for node in nodevisitor.visit(tree):
    if isinstance(node, ast.VarDecl) and node.identifier.value == 'my':
        print(node.initializer.value)

Prints hello.

1 Comment

this solution does not work when the var is an array: var my = ['hello', 'world']
0

the answer, pattern = re.compile(r"var my = '(.*?)';$", re.MULTILINE | re.DOTALL) should get a wrong way, have to remove the line-end sign $ when set re.MULTILINE re.DOTALL at same time.

try with python 3.6.4

Comments

0

Building on @alecxe's answer, but considering a more complex case of an array of dictionaries - or an array of flat json objects:

from bs4 import BeautifulSoup
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor


data = """
<script>
var my = [{'dic1key1':1}, {'dic2key1':1}];
var name = 'hi';
var is = 'halo';
</script>
"""

soup = BeautifulSoup(data, "html.parser")

script = soup.find("script", text=lambda text: text and "var my" in text)

# parse js
parser = Parser()
tree = parser.parse(script.text)
array_items = []
for node in nodevisitor.visit(tree):
    if isinstance(node, ast.VarDecl) and node.identifier.value == 'my':
        for item in node.initializer.items:
            parsed_dict = {getattr(n.left, 'value', '')[1:-1]: getattr(n.right, 'value', '')[1:-1]
                for n in nodevisitor.visit(item)
                if isinstance(n, slimit.ast.Assign)}
        array_items.append(parsed_dict)
print(array_items)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.