Scrape html only after data loads with delay using Python Requests?

Question

I am trying to learn data scraping using python and have been using the Requests and BeautifulSoup4 libraries. It works well for normal html websites. But when I tried to get some data out of websites where the data loads after some delay, I found that I get an empty value. An example would be

from bs4 import BeautifulSoup
from operator import itemgetter
from selenium import webdriver
url = "https://www.example.com/;1"
browser = webdriver.PhantomJS()
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
a = soup.find('span', 'buy')
print(a)

I am trying to grab the from here: (value)

I have already referred a similar topic and tried executing my code on similar lines as the solution provided here. But somehow it doesnt seem to work. I am a novice here so need help getting this work. How to scrape html table only after data loads using Python Requests?

The table (content) is probably generated by JavaScript and thus can't be "seen". I am using python3.6 / PhantomJS / Selenium as proposed by a lot of answers here.

You can use some of this selenium-python.readthedocs.io/waits.html or just add time.sleep(n) — amarynets
– amarynets, Commented Oct 4, 2017 at 20:43
can you please check the url? It seems the ; there is a typo and might be causing the error in your scraper — jabargas
– jabargas, Commented Oct 4, 2017 at 20:45
@AndMar time.sleep doesnt seem to work in this case. Please suggest where exactly you propose for me to add? — fazal
– fazal, Commented Oct 4, 2017 at 20:57
@jabargas same code works if i just change soup.find('span', 'buy') to soup.find('span', 'btc') which is just static content instead of dynamic content that gets loaded in a few seconds after the page loads. So i doubt there is any issue with url. — fazal
– fazal, Commented Oct 4, 2017 at 20:59
Try browser.implicitly_wait(n) where n is an integer for the amount of seconds. — Luke
– Luke, Commented Oct 4, 2017 at 21:00

songxunzhao · Accepted Answer · 2017-10-07 16:25:15Z

3

You have to run headless browser to run delayed scraping. Please use selenium. Here is sample code. Code is using chrome browser as driver

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Chrome(<chromedriver path here>)
browser.set_window_size(1120, 550)
browser.get(link)
element = WebDriverWait(browser, 3).until(
   EC.presence_of_element_located((By.ID, "blabla"))
)
data = element.get_attribute('data-blabla')
print(data)
browser.quit()

answered Oct 7, 2017 at 16:25

songxunzhao

3,1691 gold badge14 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

fazal Over a year ago

it works like a charm using chrome webdriver. But it actually opens up the browser window. Instead is there something similar with headless browser? Maybe if you have similar code for phantomjs or so that doesnt open a physical browser but works under the hood sort of a console windows or so? Thanks again. Once I get your response i will mark this post as answered.

songxunzhao Over a year ago

Please replace webdriver.chorme(<chromedirver path>) to webdriver.PhantomJS(<phantomjs driver path>). All other process is same.

fazal · Accepted Answer · 2017-10-20 09:51:21Z

0

You can access desired values by requesting it directly from API and analyze JSON response.

import requests
import json

res = request.get('https://api.example.com/api/')
d = json.loads(res.text)

print(d['market'])

edited Oct 20, 2017 at 9:51

fazal

251 silver badge7 bronze badges

answered Oct 5, 2017 at 9:22

Roman Mindlin

8521 gold badge8 silver badges13 bronze badges

1 Comment

fazal Over a year ago

Thanks for the response. Although an api will do for this site. The original idea was to still understand how to get the value scraped on such site where there is a slight delay in data loading. That is the key ask from this post.

Collectives™ on Stack Overflow

Scrape html only after data loads with delay using Python Requests?

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related