-1

I am a new coder trying to extract the following data from a script tag in HTML using BS4.

<script>
document.obj_data = {
    "earnings_announcements_earnings_table"   : 
             [  [ "11/22/22", "9/2022", "-$0.02", "-$0.06", 
"<div class=\"right neg negative neg_icon showinline down\">-0.04</div>", 
"<div class=\"right neg negative neg_icon showinline down\">-200.00%</div>", 
"--" ] ,  [ "8/30/22", "6/2022", "-$0.05", "-$0.04", 
"<div class=\"right pos positive pos_icon showinline up\">+0.01</div>", 
"<div class=\"right pos positive pos_icon showinline up\">+20.00%</div>", "Before Open" ]  ]
                
                ,
    "earnings_announcements_sales_table"      : 
             [  [ "11/22/22", "9/2022", "$1,096.70", "$1,091.78", 
"<div class=\"right neg negative neg_icon showinline down\">-4.92</div>",

... 

So far I've used the following code to get this specific script:

x = requests.get(base_url, headers = params).text
soup = BeautifulSoup(x, 'html.parser')
data = soup.find_all('script')
txt = data[25]

However I can't figure out or find any other solution that will output the data into a nice format like json. I can get this information using Selenium, but I would like to avoid as it is a heavy and slow process. Please help, thank you!

EDIT: Others have suggested a solution that uses regex and I've tried to adjust code to fit my problem. But the output is an empty list: []

output = [json.loads(m.group(1)) for m inre.finditer(r'document.obj_data.+ = ({.*})', x.text)]
4
  • well, you need JavaScript parser for that, dont you? Commented Jan 22, 2023 at 16:15
  • If you fetch the page using some method that understands javascript, then that table should appear normally. Don't use requests for pages that rely on javascript. Commented Jan 22, 2023 at 16:16
  • Does this answer your question? Scraping JavaScript var from website using Beautiful Soup in Python Commented Jan 22, 2023 at 16:19
  • @HedgeHog I've tried that solution to no avail. Commented Jan 22, 2023 at 16:31

1 Answer 1

0

this is how i usually do it:

import json
...

txt_json= json.loads(txt.string)
print(json.dumps(txt_json, indent=4))
...
Sign up to request clarification or add additional context in comments.

7 Comments

I get the following JSONDecodeError: Expecting value: line 2 column 1 (char 1)
did you replace soup.find_all with soup.find ? with find_all you would have to iterate over your results
There's multiple scripts in the soup, so I found the specific script I want to extract with txt.
i see. ive edited my code. you can add this beneath your txt = data[25]
I'm still receiving the same error.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.