How to read line of HTML as string in c#

Question

I am trying to get a page title from page source of different pages. But lets say some pages have title like this:

&quot;This is an example,&quot; ABC.

It has some html in it like """. If i use string in c# to get this title i get the whole thing and while displaying it displays it like above which is wrong. Is there any way to ignore or to take into account html values in c#?

I am also using htmlagilitypack so anything in that will do too.

Francisco · Accepted Answer · 2013-10-22 20:20:21Z

3

You can use WebUtility.HtmlDecode to decode html, link on MSDN:

WebUtility.HtmlDecode("&quot;This is an example,&quot; ABC.");

just use:

using System.Net;

The result will be: "\"This is an example,\" ABC."

You also can use HtmlEntity.DeEntitize in HTML Agility Pack:

HtmlEntity.DeEntitize(string text)

edited Oct 22, 2013 at 20:20

Francisco

4,1113 gold badges26 silver badges27 bronze badges

answered Sep 29, 2012 at 16:50

cuongle

75.5k30 gold badges155 silver badges212 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

NoviceMe Over a year ago

So i will have to download WebUtility? Is there anything in HTML Agility Pack to do this kind of thing?

Zelter Ady · Accepted Answer · 2012-09-29 16:43:00Z

0

You don't know what you can find in the page title. Sometimes is a whole mess there. My suggestion is to get the string as it is and process it before to show/save it.

In this case, the solution is simple: replace the

&quot;

with corresponding char.

Each time you read a HTML document to extract some tags, take care to tags never closed. If the user forget to close the title tag... you'll get in that line the whole page!

answered Sep 29, 2012 at 16:43

Zelter Ady

6,35812 gold badges56 silver badges80 bronze badges

2 Comments

NoviceMe Over a year ago

I mean i can do that for this particular string. But if i run 1000 queries i will get lot of different characters how will i replace them all? There should be an easy way to convert right?

Zelter Ady Over a year ago

The possibilities are limited. Read about special html characters: utexas.edu/learn/html/spchar.html and implement some replacement method(s).

Collectives™ on Stack Overflow

How to read line of HTML as string in c#

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related