2

I have been tasked with finding a solution to quite a novel issue. I have a variety of httpclient calls that I have to make in order to authenticate against a 3rd party vendor. However, part of this process involves dynmaically generated values being created in javascript and passed to a form, which is then posted to the 3rd party. As I'm using the httpclient class, I cannot obviously generate/run the javascript and thus the process comes to a halt right here (the posting of these values creates an important authentication cookie for an intermediate step).

So, I'd like to be able to take this simple html, which contains a form and some javascript, and have my c# code evaluate this and then retrieve the values that javscript has assigned to the form. I'd then use these values and continue with the workflow processes.

I could take a clunky route and use the webbrowser control. However, as this is being used in a non visual environment, I'd like to be able pass the html string into some sort of emulator and receive the parsed html back as a return. Below is an example of the simple html that I'd be dealing with:

<html>
<head>
    <script type="text/javascript">
        function testLoad() {
            document.forms[0].elements[0].value = "some guid id plus the date:" + Date.getDate + 'some random js value';
            document.forms[0].elements[1].value = decodeURIComponent(document.forms[0].elements[1].value);
            document.forms[0].elements[2].value = decodeURIComponent(document.forms[0].elements[2].value);
            // optionally submit -or just get the returned form values and post from htmlclient
            document.forms[0].submit();
        }</script>
    <noscript>Please enable JavaScript to view the page content.</noscript>
</head>
<body onload="testLoad()">
    <form method="POST" action="/" />
        <input type="hidden" name="test_id" value="idstuff" />
        <input type="hidden" name="test_123" value="encoded value" />
        <input type="hidden" name="test_another" value="1.01" />
    </form>
</body>
</html>

Once the html has been returned from the emulated process, I'd then use HtmlAgilityPack to grab the form values that have been populated by the javascript function (testLoad()) and progress to the next steps.

Am I aiming too high here, or has this bridge been crossed a few times. I've looked at http://wiki.awesomium.com, csExWB, jint and a few others, but none seem to take the really simple approach that I'm hoping for here. Think of my quest as being able to use the initial html as a parameter and have the emlulator return the patched html.

Hope the above is clear - I am wishing to evaluate the html/js from a serverside process and then move onto the next process within my c# workflow!.

[edit] - this looks VERY promising: http://www.tomdupont.net/2013/08/phantomjs-headless-browser-for-net-webdriver.html. I've taken the tips here and am using PhantomJs with Selenium... so far, so good!!

[oh and just to point out, this is not for any sinister use, the 3rd party in question just doesn't yet have a b2b api in place to permit the interop that we require between us]

6
  • How complex is the Javascript? Is that an accurate example you've provided? Because you could just rewrite that in C# and use a simple HTTP post using the values gained via the HtmlAgilityPack Commented May 22, 2014 at 13:34
  • hi there, unfortunately, this is of course a simplified version of the javascript. the real javascript runs some meaty validations, ^'s as well as calling some core js functions against looped data Commented May 22, 2014 at 13:37
  • In that case, IronJS Commented May 22, 2014 at 13:38
  • i did see that earlier when looking, however, it appears to only avaluate ecma script and not the full web stakc (i.e. html plus script in single unit)...thanks tho Commented May 22, 2014 at 13:41
  • 1
    it doesn't fit the requirement mainly becuase I will be baking this into our b2b api (which I will replace once the 3rd party has changed their platform). our side of the api will live on an azure server and thus the process needs to be encapsulated (preferably) inside a non visual process that has a low startup and memory footprint. that said, don't think i haven't tried with old mr wb!! ;) Commented May 22, 2014 at 13:50

3 Answers 3

3

AngleSharp also contains a short demo (project) that connects Jint (a JavaScript interpreter, completely written in .NET) to it. Both are PCL projects and they work together without problems. That should provide everything that is usually used in JavaScript / the DOM.

A very simple example looks like:

static void SimpleScriptingSample()
{
    //We require a custom configuration
    var config = new Configuration();

    //Including a script engine
    config.Register(new JavaScriptEngine());

    //And enabling scripting
    config.IsScripting = true;

    //This is our sample source, we will set the title and write on the document
    var source = @"<!doctype html>
        <html>
        <head><title>Sample</title></head>
        <body>
        <script>
        document.title = 'Simple manipulation...';
        document.write('<span class=greeting>Hello World!</span>');
        </script>
        </body>";
    var document = DocumentBuilder.Html(source, config);

    //Modified HTML will be output
    Console.WriteLine(document.DocumentElement.OuterHtml);
}

This will print the (serialized) DOM, which already contains the modifications (such as a new title and the inserted span element).

Sign up to request clarification or add additional context in comments.

6 Comments

thanks for this florian, i'll take a look over that as it looks like a nice simple implementation - plus i imagine it can mix 'n match with htmagilitypack, post documentbuilder
Thanks for your great work @Florian. But it seems your whole documentation on AngleSharp is outdated. can you please update the documentation? for example I couldn't figure out to register javascript engine. the config.register() gets requester as argument. There is no code example...
It is true, a lot of things just recently. I'll update the documentation as soon as I can (hopefully still this week). In the mean time I can recommend you the samples at github.com/AngleSharp/AngleSharp.Samples -- they work with the current version and also showcase JavaScript integration.
Hi @GavinWilliams use BrowsingContext. It has methods like OpenAsync. Ideally, you can just stream a source. If you have an HTML string then use the response builder pattern like context.OpenAsync(res => res.Content(myHtmlString))
@Florian Rappl Thank you for your help on here and also on GitHub recently. You've been a great help with usage of your library!
|
1

There's PhantomJS, which can be scripted via JavaScript and run as an external process from C#:

4 Comments

+1 looks promising. do you have any 1st hand experience with this?? Altho i love link juice, nothing whacks me more than 1st hand use and anecdotal evidence of it's efficacy... as said tho - looks like a contender
@jimtollan: no firsthand experience here, sorry
no worries, i will of course try this out. i'll then drop an update. however, a range of other answers/comments will hopefully narrow my quest...
hey, i updated the original post to point to a setup page for .net and phantomjs -works really well!!
1

It sounds like you'll need a headless browser to execute the html/javascript. Take a look here.

I would prefer AngleSharp over HtmlAgilityPack though.

1 Comment

alas, htmlagilitypack is used throughout the project, so no option to change

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.