1

i am actually trying to grap the text from a tag which has some child tags

For example:

<p><span>Child Text </span><span class="price">Child Text</span><br />
I need this text</p>

This is what i am trying

HtmlElement menuElement = browser.Document.GetElementsByTagName("p");
String mytext = menuElement.InnerHtml;   //also tried innerText,OuterHtml,OuterText

UPDATE: I think i have to use Htmlagilitypack, so now my question is how to do this using htmlagilitypack lib, I'm new to it.

Thanks

9
  • since you need to find in c#, remove javascript tag from question Commented Apr 28, 2012 at 19:34
  • @CharandeepSingh - You can make a suggested edit to the tags, you know? Commented Apr 28, 2012 at 19:35
  • Essentially you need the direct child node that is a text node. Not sure this is possible with HtmlElement. The HTML Agility Pack may be more flexible in this respect. Commented Apr 28, 2012 at 19:36
  • You should be able to just iterate over the elements contained in the menu element and just take the contents of the text nodes, but I'm not booted into windows at the moment so can't check. Commented Apr 28, 2012 at 19:37
  • @Oded - I didn't know I have that kind of privilege. Thanks ;) Commented Apr 28, 2012 at 19:37

3 Answers 3

2

There are many approaches to this from using regex to web scraping libraries.i recommend you to use htmlagilitypack with that you can address exactly what you need by xpath. add reference and namespace to HtmlAgilityPack and i 'm using linq(this requires .net 3.5 or better) with the code below you can do that.

using HtmlAgilityPack;
using System.Linq;

// these references must be available.

        private void Form1_Load(object sender, EventArgs e)
        {
            var rawData = "<p><span>Child Text </span><span class=\"price\">Child Text</span><br />I need this text</p>";
            var html = new HtmlAgilityPack.HtmlDocument();
            html.LoadHtml(rawData);
            html.DocumentNode.SelectNodes("//p/text()").ToList().ForEach(x=>MessageBox.Show(x.InnerHtml));
        }
Sign up to request clarification or add additional context in comments.

Comments

0

It's much, much easier if you can put the "need this text" inside a span with an id -- then you just grab that id's .innerHTML(). If you can't change the markup, you can grab menuElement's .innerHTML() and string match for content after "
", but that's quite fragile.

1 Comment

thanks robrich, but i cant make changes to html code, further i have many tags that i want to grab by looping, so matching the string is not a choice 4 me.
0

You can get the text by splitting the DocumentText up into different parts.

string text = "<p><span>Child Text </span><span class="price">Child Text</span><br />I need this text</p>";
text = text.Split(new string{"<p><span>Child Text </span><span class="price">Child Text</span><br />"}, StringSplitOptions.None)[1];
// Splits the first part of the text, leaving us with "I need this text</p>"
// We can remove the last </p> many ways, but here I will show you one way.
text = text.Split(new string{"</p>"}, StringSplitOptions.None)[0];
// text now has the value of "I need this text"

Hope this Helps!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.