Parsing data scraped from Javascript rendered webpage with python

Question

I am trying to use .find off of a soup variable but when I go to the webpage and try to find the right class it returns none.

from bs4 import *
import time
import pandas as pd
import pickle
import html5lib
from requests_html import HTMLSession

s = HTMLSession()
url = "https://cryptoli.st/lists/fixed-supply"


def get_data(url):
    r = s.get(url)
    global soup
    soup = BeautifulSoup(r.text, 'html.parser')
    return soup

def get_next_page(soup):
    page = soup.find('div', {'class': 'dataTables_paginate paging_simple_numbers'})
    return page
    
get_data(url)
print(get_next_page(soup))

The "page" variable returns "None" even though I pulled it from the website element inspector. I suspect it has something to do with the fact that the website is rendered with javascript but can't figure out why. If I take away the {'class' : ''datatables_paginate paging_simple_numbers'} and just try to find 'div' then it works and returns the first div tag so I don't know what else to do.

modern pages may use JavaScript to add elements and BS can't run JavaScript. You may need selenium to control real web browser which can run JavaScript. You can turn off JavaScript in browser and reload page to see if page uses JavaScript. — furas
– furas, Commented May 20, 2021 at 23:43
this page may have all data inside HTML but in <script> as cl.coinmainlist.dataraw = [ ...] but they would need some tool to convert it to something useful - and this can be the problem. — furas
– furas, Commented May 20, 2021 at 23:53

DisappointedByUnaccountableMod · Accepted Answer · 2021-05-21 06:05:09Z

6

So you want to scrape dynamic page content , You can use beautiful soup with selenium webdriver. This answer is based on explanation here https://www.geeksforgeeks.org/scrape-content-from-dynamic-websites/

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

url = "https://cryptoli.st/lists/fixed-supply"
  
driver = webdriver.Chrome('./chromedriver') 
driver.get(url) 
  
# this is just to ensure that the page is loaded
time.sleep(5) 
  
html = driver.page_source
  
# this renders the JS code and stores all
# of the information in static HTML code.
  
# Now, we could simply apply bs4 to html variable
soup = BeautifulSoup(html, "html.parser")

edited May 21, 2021 at 6:05

DisappointedByUnaccountableMod

6,8444 gold badges21 silver badges23 bronze badges

answered May 21, 2021 at 0:20

Maitreyee Das

931 silver badge9 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Parsing data scraped from Javascript rendered webpage with python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related