1

I am trying to use .find off of a soup variable but when I go to the webpage and try to find the right class it returns none.

from bs4 import *
import time
import pandas as pd
import pickle
import html5lib
from requests_html import HTMLSession

s = HTMLSession()
url = "https://cryptoli.st/lists/fixed-supply"


def get_data(url):
    r = s.get(url)
    global soup
    soup = BeautifulSoup(r.text, 'html.parser')
    return soup

def get_next_page(soup):
    page = soup.find('div', {'class': 'dataTables_paginate paging_simple_numbers'})
    return page
    
get_data(url)
print(get_next_page(soup))

The "page" variable returns "None" even though I pulled it from the website element inspector. I suspect it has something to do with the fact that the website is rendered with javascript but can't figure out why. If I take away the {'class' : ''datatables_paginate paging_simple_numbers'} and just try to find 'div' then it works and returns the first div tag so I don't know what else to do.

2
  • 1
    modern pages may use JavaScript to add elements and BS can't run JavaScript. You may need selenium to control real web browser which can run JavaScript. You can turn off JavaScript in browser and reload page to see if page uses JavaScript. Commented May 20, 2021 at 23:43
  • 1
    this page may have all data inside HTML but in <script> as cl.coinmainlist.dataraw = [ ...] but they would need some tool to convert it to something useful - and this can be the problem. Commented May 20, 2021 at 23:53

1 Answer 1

6

So you want to scrape dynamic page content , You can use beautiful soup with selenium webdriver. This answer is based on explanation here https://www.geeksforgeeks.org/scrape-content-from-dynamic-websites/

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

url = "https://cryptoli.st/lists/fixed-supply"
  
driver = webdriver.Chrome('./chromedriver') 
driver.get(url) 
  
# this is just to ensure that the page is loaded
time.sleep(5) 
  
html = driver.page_source
  
# this renders the JS code and stores all
# of the information in static HTML code.
  
# Now, we could simply apply bs4 to html variable
soup = BeautifulSoup(html, "html.parser")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.