0

In the past, when I've used BeautifulSoup and lxml to parse webpages, it's been pretty easy because links all looked like this: <a href="www.website.com">Website</a>. However, I've encountered some webpages where links appear in the browser but not in the page source.

For example, on this Edmunds.com page, the Past Long-Term Road Tests section looks like this:

1991 Acura NSX
2011 Acura TSX Sport Wagon
...


However, the source code for the Past Long Long-Term Road Tests section of the page looks like this:

<script type="text/javascript">
PAGESETUP.addControl(function() {
function linksObj(){
var elink = "|acura|nsx|1991|long-term-road-test|"; //generates edmunds.com/acura/nsx/1991/long-term-road-test/
this.link0 = {anchor:elink,label:"1991 Acura NSX"};
var elink = "|acura|tsx-sport-wagon|2011|long-term-road-test|"; //generates edmunds.com/acura/tsx-sport-wagon/1991/long-term-road-test/
this.link1 = {anchor:elink,label:"2011 Acura TSX Sport Wagon"};
...
}
var links_obj = new linksObj();
var links_container = document.getElementById('links_list_offpage2');
var more_link = "";
var more_link_text = "";
var elinks = new EDMUNDS.linksList(links_obj, links_container,more_link, more_link_text);
}, 'low');
</script>

The Javascript line var elink = "|acura|nsx|1991|long-term-road-test|"; gets expanded to edmunds.com/acura/nsx/1991/long-term-road-test in the browser.


Tools like BeautifulSoup and lxml aren't finding the links that are generated in Javascript. How can I parse these links?

1
  • Copy the EDMUNDS.linkList function I guess Commented Feb 15, 2013 at 5:56

1 Answer 1

2

Use a headless browser such as ghost.py to run the page's JavaScript, and you should have no problem scrapting the JS-altered DOM.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.