C# html viewing using html agility pack

Question

I made a console c# application which is supposed to display the html source of a page.

Instead, the console app is showing HtmlAgilityPack.HtmlDocument.

Can anyone explain to me why that is?

class Program
{
    public HtmlDocument read()
    {
        HtmlWeb htmlWeb = new HtmlWeb();
        try
        {
            HtmlAgilityPack.HtmlDocument document = htmlWeb.Load("http://www.yahoo.com");
            return document;
        }
        catch (Exception e)
        {
            Console.WriteLine("Error : " + e.ToString());
            return null;     
        }
    }     

    static void Main(string[] args)
    {
        Program dis = new Program();
        string text = Convert.ToString(dis.read());
        Console.WriteLine(text);
        Console.ReadLine();        
    }
}

I don't know the model of HtmlDocument; but clearly its ToString() is not implemented to return the html. You will need to inspect the properties and use one of them which should contain the source. — Nate
– Nate, Commented Jul 3, 2013 at 15:30

Amine Hajyoussef · Accepted Answer · 2013-07-03 15:41:05Z

3

replace

 return document;

with:

 return document.DocumentNode.InnerHtml;

or if you wanna to extract text only (without HTML tags):

 return document.DocumentNode.InnerText;

the whole code would be:

class Program
{
    public string read()
    {
        HtmlWeb htmlWeb = new HtmlWeb();
        try
        {
            HtmlAgilityPack.HtmlDocument document = htmlWeb.Load("http://www.yahoo.com");
            return document.DocumentNode.InnerHtml;
        }
        catch (Exception e)
        {
            Console.WriteLine("Error : " + e.ToString());
            return null;     
        }
    }     

    static void Main(string[] args)
    {
        Program dis = new Program();
        string text = dis.read();
        Console.WriteLine(text);
        Console.ReadLine();        
    }
}

edited Jul 3, 2013 at 15:41

answered Jul 3, 2013 at 15:33

Amine Hajyoussef

4,4403 gold badges24 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Tim · Accepted Answer · 2013-07-03 15:34:05Z

2

The default implementation of .ToString() is just to output the name of the class, which is what you're seeing. So HtmlDocument from the HtmlAgilityPack obviously doesn't provide a derived implementation.

From glancing at the code over on CodePlex, it looks like you need to use the Save function to save the output to an XmlWriter and then use that to get the string. I don't see another way to get at the whole contents of the page directly from that object (though admittedly I just scanned it).

Edit: Amine Hajyoussef pointed you in the right direction with document.DocumentNode.Innerhtml, though note that you'll need to change the return type of the function as well.

answered Jul 3, 2013 at 15:34

Tim

15.3k1 gold badge48 silver badges70 bronze badges

Collectives™ on Stack Overflow

C# html viewing using html agility pack

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related