How to extract data from <script> using Beautiful Soup / not using Selenium?

Question

I am a new coder trying to extract the following data from a script tag in HTML using BS4.

<script>
document.obj_data = {
    "earnings_announcements_earnings_table"   : 
             [  [ "11/22/22", "9/2022", "-$0.02", "-$0.06", 
"<div class=\"right neg negative neg_icon showinline down\">-0.04</div>", 
"<div class=\"right neg negative neg_icon showinline down\">-200.00%</div>", 
"--" ] ,  [ "8/30/22", "6/2022", "-$0.05", "-$0.04", 
"<div class=\"right pos positive pos_icon showinline up\">+0.01</div>", 
"<div class=\"right pos positive pos_icon showinline up\">+20.00%</div>", "Before Open" ]  ]
                
                ,
    "earnings_announcements_sales_table"      : 
             [  [ "11/22/22", "9/2022", "$1,096.70", "$1,091.78", 
"<div class=\"right neg negative neg_icon showinline down\">-4.92</div>",

...

So far I've used the following code to get this specific script:

x = requests.get(base_url, headers = params).text
soup = BeautifulSoup(x, 'html.parser')
data = soup.find_all('script')
txt = data[25]

However I can't figure out or find any other solution that will output the data into a nice format like json. I can get this information using Selenium, but I would like to avoid as it is a heavy and slow process. Please help, thank you!

EDIT: Others have suggested a solution that uses regex and I've tried to adjust code to fit my problem. But the output is an empty list: []

output = [json.loads(m.group(1)) for m inre.finditer(r'document.obj_data.+ = ({.*})', x.text)]

If you fetch the page using some method that understands javascript, then that table should appear normally. Don't use requests for pages that rely on javascript. — John Gordon
– John Gordon, Commented Jan 22, 2023 at 16:16
Does this answer your question? Scraping JavaScript var from website using Beautiful Soup in Python — HedgeHog
– HedgeHog, Commented Jan 22, 2023 at 16:19

tetris programming · Accepted Answer · 2023-01-22 16:29:03Z

0

this is how i usually do it:

import json
...

txt_json= json.loads(txt.string)
print(json.dumps(txt_json, indent=4))
...

edited Jan 22, 2023 at 16:29

answered Jan 22, 2023 at 16:17

tetris programming

1,5391 gold badge4 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

maxipaddy Over a year ago

I get the following JSONDecodeError: Expecting value: line 2 column 1 (char 1)

tetris programming Over a year ago

did you replace soup.find_all with soup.find ? with find_all you would have to iterate over your results

maxipaddy Over a year ago

There's multiple scripts in the soup, so I found the specific script I want to extract with txt.

tetris programming Over a year ago

i see. ive edited my code. you can add this beneath your txt = data[25]

maxipaddy Over a year ago

I'm still receiving the same error.

|

Collectives™ on Stack Overflow

How to extract data from <script> using Beautiful Soup / not using Selenium?

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related