0

The web page is using JavaScript to put content on the page, so then I got plain HTML there is no data on the page. I need somerthing that will not only read the HTML, but also will execute and apply JavaScript to DOM, and only then return result as HTML text.

The task is exactly same as in this question, but I'm looking a solution for .NET.

2
  • If you want javascript-generated content, then you need a browser engine to actually "run" the page and you can then examine the resulting DOM. Commented Aug 8, 2013 at 4:37
  • @jfriend00 well, that's apparent solution, but I'm looking for something like lobobrowser.org/cobra/java-html-parser.jsp (a solution for similar task in Java) Commented Aug 8, 2013 at 4:41

1 Answer 1

1

I'll be surprised if you find anything like that for .NET. I would use PhantomJS to open the page and interact with the DOM. It's a highly scriptable headless WebKit browser and will do exactly what you want with ease. See How to print html source to console with phantomjs for an example.

var page = require('webpage').create();
page.open('http://google.com', function () {
    console.log(page.content);
    phantom.exit();
});

You'll have to install PhantomJS and then launch a separate process to run your script, but PhantomJS will probably do a much better job than anything you can find written for .NET.

Sign up to request clarification or add additional context in comments.

3 Comments

I will look to it a little deeper, but so far I can't get what I want with it. Code in the example still returns pure html. I guess my problem with mine particular site is more complex than I thought. The data is downloaded in AJAX requests, and PhantomJS doesn't catches that, of course. I think it's easier to just look into AJAX calls and use them directly. Anyway thanks, PhantomJS is neat tool, maybe it will be useful for me some other time.
PhantomJS executes all the AJAX calls and has mechanisms for examining those requests (see onResourceRequested). But if all the data you're interested in is in an AJAX request, it would be better to do like you said and just make that HTTP request yourself from your program.
I guess I'll accept whis as an answer, since it's relative to the question and nobody proposed a better solution anyway.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.