I am a new coder trying to extract the following data from a script tag in HTML using BS4.
<script>
document.obj_data = {
"earnings_announcements_earnings_table" :
[ [ "11/22/22", "9/2022", "-$0.02", "-$0.06",
"<div class=\"right neg negative neg_icon showinline down\">-0.04</div>",
"<div class=\"right neg negative neg_icon showinline down\">-200.00%</div>",
"--" ] , [ "8/30/22", "6/2022", "-$0.05", "-$0.04",
"<div class=\"right pos positive pos_icon showinline up\">+0.01</div>",
"<div class=\"right pos positive pos_icon showinline up\">+20.00%</div>", "Before Open" ] ]
,
"earnings_announcements_sales_table" :
[ [ "11/22/22", "9/2022", "$1,096.70", "$1,091.78",
"<div class=\"right neg negative neg_icon showinline down\">-4.92</div>",
...
So far I've used the following code to get this specific script:
x = requests.get(base_url, headers = params).text
soup = BeautifulSoup(x, 'html.parser')
data = soup.find_all('script')
txt = data[25]
However I can't figure out or find any other solution that will output the data into a nice format like json. I can get this information using Selenium, but I would like to avoid as it is a heavy and slow process. Please help, thank you!
EDIT:
Others have suggested a solution that uses regex and I've tried to adjust code to fit my problem. But the output is an empty list: []
output = [json.loads(m.group(1)) for m inre.finditer(r'document.obj_data.+ = ({.*})', x.text)]
requestsfor pages that rely on javascript.