0

I have a control that returns a datatable which consists of html code as a string in each row. I'm trying to use Regex to extract only the words enclosed within the HTML tags

{[h]</span></p><p class="MsoNormal" style="text-align: left;"><span style="color: #ff6600; font-weight: bold;"><span style="font-family: arial, helvetica, sans-serif;">What do they mean today?</span></span></p><p style="text-align: left; margin: 0px;"><span style="font-family: arial, helvetica, sans-serif;">[/h]}

I want to extract only the sentence What do they mean today? or any sentence that consists of more than 1 word.

I tried (/w*/s?)* but seems to only look at the beginning of the string and not throughout the whole string. I'm not very good with regular expressions. Any help will be much appreciated.

5
  • 2
    If you are parsing the HTML from database and it is a lot then you might want to use Html Agility Pack. Commented Feb 10, 2015 at 7:23
  • @AvinashRaj The regular expression only matches "p class". I was thinking of maybe trying to capture ">What do they mean?<" or ">(any sentence)<" as that would avoid matching words such as "p class" and "span style". Commented Feb 10, 2015 at 7:44
  • @CoderofCode the string isn't being read from a database. Unfortunately I do not have the option of changing what is return from the control otherwise I would just return the string I need. I'm dealing with someone else's undocumented code. Commented Feb 10, 2015 at 7:46
  • That's not the point the point is if you are dealing with lots of html then it is good to use what is already there why reinvent the wheel ? Commented Feb 10, 2015 at 7:47
  • @CoderofCode Oh ok, I get what you mean. I will have a look into the Html Agility Pack. Thanks Commented Feb 10, 2015 at 8:04

1 Answer 1

1

You could use the below regex to grab the string you want.

@"(?<=>)[^<>]+(?=<)"

But regex is not the recommended way to parse html files.

DEMO

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, it works. I figured that a better approach would be to use Html agility packs instead of doing a match with Regex as @CoderofCode said. Thanks again for all your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.