0

How can one extract data from a rendered web page? In which java script would update the data with time. Is it possible to write user script which can access varibles from webpage java script? Please suggest possible way to achieve this.

2
  • Where are you trying to do this, client side or server side? If on client side & you are using a browser, Greasemonkey should do. Commented Nov 19, 2009 at 18:59
  • I'm trying to on client side, I want to extract data. Commented Nov 20, 2009 at 11:23

2 Answers 2

2

according to Turing's Halting Problem Theorem, you can't.

That's what we mean when we say that JavaScript is a Turing complete language. The only way is to execute the JavaScript and let it render the page.

Sign up to request clarification or add additional context in comments.

3 Comments

Isn't it possible to use cURL and get the rendered page?
cURL does only the first part of the job: the HTTP fetch to get the HTML/CSS/JS code. After that, a browser parses/renders the HTML and executes the JavaScript code. The HTML rendering part isnt' needed (since the question is about picking data, not screenshots), but the JavaScript keeps on updating it, so to get those updates you have to execute the JavaScript code.
Note that what Zenon's answer suggest is precisely to execute the JavaScript. in effect writing a small browser under your program's control.
2

it depends on your programming language.

In C# you could use a webbrowser control, then use the Webbrowser.Document property to get a HTMLDocument object to get the current markup. To invoke javascript function in the document, use the ObjectForScripting property of the Browser control.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.