I am trying to request the HTML of a website generated with JavaScript so I can scrape the information with BeautifulSoup. The problem is when I try requesting the HTML, the information I receive is before the page is rendered. The following is the code I am running:
import requests
import urllib.request
from requests_html import HTMLSession
url = 'https://www.edwarddan.com/projects'
this_session = HTMLSession()
response = this_session.get(url)
response.html.render()
print(response.text)
print("---------------------------------------------------------------")
soup = BeautifulSoup(response.text, "html.parser")
names = soup.findAll("div")
As a result, I get the following HTML:
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="robots" content="noindex" />
<script src="/cdn-cgi/apps/head/cTVctQJ-rr0oH623j2V4Pf03v-o.js"></script>
<link rel="icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<meta name="theme-color" content="#000000" />
<meta name="description" content="Edward Dan -- Last updated 1/21/2021" />
<link rel="apple-touch-icon" href="/logo192.png" />
<link rel="manifest" href="/manifest.json" />
<title>Edward Dan</title>
<link href="/static/css/main.d231a676.chunk.css" rel="stylesheet">
</head>
<body><noscript>You need to enable JavaScript to run this app.</noscript>
<div id="root"></div>
<script>
! function(e) {
function r(r) {
for (var n, p, l = r[0], a = r[1], f = r[2], c = 0, s = []; c < l.length; c++) p = l[c], Object.prototype.hasOwnProperty.call(o, p) && o[p] && s.push(o[p][0]), o[p] = 0;
for (n in a) Object.prototype.hasOwnProperty.call(a, n) && (e[n] = a[n]);
for (i && i(r); s.length;) s.shift()();
return u.push.apply(u, f || []), t()
}
function t() {
for (var e, r = 0; r < u.length; r++) {
for (var t = u[r], n = !0, l = 1; l < t.length; l++) {
var a = t[l];
0 !== o[a] && (n = !1)
}
n && (u.splice(r--, 1), e = p(p.s = t[0]))
}
return e
}
var n = {},
o = {
1: 0
},
u = [];
function p(r) {
if (n[r]) return n[r].exports;
var t = n[r] = {
i: r,
l: !1,
exports: {}
};
return e[r].call(t.exports, t, t.exports, p), t.l = !0, t.exports
}
p.m = e, p.c = n, p.d = function(e, r, t) {
p.o(e, r) || Object.defineProperty(e, r, {
enumerable: !0,
get: t
})
}, p.r = function(e) {
"undefined" != typeof Symbol && Symbol.toStringTag && Object.defineProperty(e, Symbol.toStringTag, {
value: "Module"
}), Object.defineProperty(e, "__esModule", {
value: !0
})
}, p.t = function(e, r) {
if (1 & r && (e = p(e)), 8 & r) return e;
if (4 & r && "object" == typeof e && e && e.__esModule) return e;
var t = Object.create(null);
if (p.r(t), Object.defineProperty(t, "default", {
enumerable: !0,
value: e
}), 2 & r && "string" != typeof e)
for (var n in e) p.d(t, n, function(r) {
return e[r]
}.bind(null, n));
return t
}, p.n = function(e) {
var r = e && e.__esModule ? function() {
return e.default
} : function() {
return e
};
return p.d(r, "a", r), r
}, p.o = function(e, r) {
return Object.prototype.hasOwnProperty.call(e, r)
}, p.p = "/";
var l = this["webpackJsonpmy-app"] = this["webpackJsonpmy-app"] || [],
a = l.push.bind(l);
l.push = r, l = l.slice();
for (var f = 0; f < l.length; f++) r(l[f]);
var i = a;
t()
}([])
</script>
<script src="/static/js/2.c42d857a.chunk.js"></script>
<script src="/static/js/main.60cdc4af.chunk.js"></script>
</body>
</html>
This HTML produces none of the elements that are the seen when the website has finished generating. I was wondering if there was a way I can wait for the website to finish running JavaScript and create all its elements before fetching said information?
I am using HTMLSession because, from research, I found that it allow websites to load. Specifically, the line response.html.render() should render the page before fetching data, however, it doesn't seem to be working as nothing has rendered.
I have also tried using Selenium in combination with PhantomJS, however, it seems that Selenium is a bit outdated and I would prefer not to use that?
Does anyone know how if there is a way to wait for the JS to finish running with HTMLSession? If not, is there another library I can use that will allow me to have this functionality?