Execute HTML Javascript from Command Line

Question

I have a number of web pages that I am attempting to parse information from obtained using curl. Each of the page uses JQuery to transform its content upon the document being loaded in the browser (using the document.ready function) - mostly setting the classes/ids of divs. The information is much easier to parse once the Javascript functions have been loaded.

What are my options for (preferably from the command line) executing the Javascript content of the pages and dumping the transformed HTML?

getfirebug.com/commandline ?? is this what you are looking for man. — Tats_innit
– Tats_innit, Commented May 20, 2012 at 8:41
+1 sounds interesting :) I thought about node.js for a while but that won't work for you =/ — Ja͢ck
– Ja͢ck, Commented May 20, 2012 at 8:44

Community · Accepted Answer · 2017-05-23 11:56:29Z

2

To scrape dynamic web, don't use static download tools like curl.

If you want to scrape dynamic web use a headless webbrowser which you can control from your programming language. The most popular tool for this is Selenium

http://code.google.com/p/selenium/

With Selenium you can export modified DOM tree out of the browser as HTML.

An example use case:

https://stackoverflow.com/a/10053589/315168

edited May 23, 2017 at 11:56

CommunityBot

11 silver badge

answered May 20, 2012 at 9:35

Mikko Ohtamaa

85k63 gold badges296 silver badges479 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mmccomb Over a year ago

Thanks Mikko, I ended up using Selenium with the Java & Chrome bindings to load each page and subsequently dump the page source - it worked a treat!

Collectives™ on Stack Overflow

Execute HTML Javascript from Command Line

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related