How can one extract data from a rendered web page? In which java script would update the data with time. Is it possible to write user script which can access varibles from webpage java script? Please suggest possible way to achieve this.
-
Where are you trying to do this, client side or server side? If on client side & you are using a browser, Greasemonkey should do.vsr– vsr2009-11-19 18:59:07 +00:00Commented Nov 19, 2009 at 18:59
-
I'm trying to on client side, I want to extract data.kanna– kanna2009-11-20 11:23:08 +00:00Commented Nov 20, 2009 at 11:23
Add a comment
|
2 Answers
according to Turing's Halting Problem Theorem, you can't.
That's what we mean when we say that JavaScript is a Turing complete language. The only way is to execute the JavaScript and let it render the page.
3 Comments
Bhargav Nanekalva
Isn't it possible to use cURL and get the rendered page?
Javier
cURL does only the first part of the job: the HTTP fetch to get the HTML/CSS/JS code. After that, a browser parses/renders the HTML and executes the JavaScript code. The HTML rendering part isnt' needed (since the question is about picking data, not screenshots), but the JavaScript keeps on updating it, so to get those updates you have to execute the JavaScript code.
Javier
Note that what Zenon's answer suggest is precisely to execute the JavaScript. in effect writing a small browser under your program's control.
it depends on your programming language.
In C# you could use a webbrowser control, then use the Webbrowser.Document property to get a HTMLDocument object to get the current markup. To invoke javascript function in the document, use the ObjectForScripting property of the Browser control.