0

I have below html content, I want extract the Id only like 31673 31672 3166 316 using regular expression.

<a href="/CaseMgrTesting/Pat/Summary/31673">31673</a>
<a href="/CaseMgrTesting/Pat/Summary/31672">31672</a>
<a href="/CaseMgrTesting/Pat/Summary/3166">3166</a>
<a href="/CaseMgrTesting/Pat/Summary/316">316</a>

I create regular expression like below, unfortunately it only return 31673 31672. I also want remove hard code like href="/CaseMgrTesting/Pat/Summary/ and \d\d\d\d\d ,Anybody can give me correct regular expression will be greate appreciate.

(?<=<a\shref="/CaseMgrTesting/Pat/Summary/\d\d\d\d\d">).*(?=</a>)
6
  • 1
    Simple: you don't. You would use an HTML parser. Commented Dec 19, 2012 at 18:54
  • 4
    Are you trying to use regex to parse html? If so, you might want to read this: stackoverflow.com/questions/1732348/… Commented Dec 19, 2012 at 18:55
  • 4
    Every time you use regexes for HTML parsing, another web developer feels the sudden urge to weep silently in the corner for seven years straight. Commented Dec 19, 2012 at 18:56
  • 1
    Also, isn't trying to match up to five digits exactly? I'm still rusty on regex to talk too much about it. That might explain why you are only getting the five ones. Commented Dec 19, 2012 at 19:05
  • Like @CBredlow said: (?<=<a\shref="/CaseMgrTesting/Pat/Summary/\d+">).*(?=</a>) Commented Dec 19, 2012 at 20:28

3 Answers 3

1

Your one-stop answer is Html Agility Pack. This nifty must-have allows you to approach HTML by node. Learn it. Live it. Love it.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Wimbo!!! I believe "Html Agility Pack" is a very good way to extract data within html code. I will learn it. Actually, I pull out above question purpose is to learn Regular Expression which is make me headache but very powerfully. I hate Regular Expression many years but recently I found my genius ex-manger's code using it, 3 lines validation code cover very very complex logic.
1
<a .*?>(.*)</a>

use this regex for this question. Its simple one try it.

1 Comment

Good answer. I edited your answer since you didn't indent the code, which resulted in some invisible code.
0

Use this (an updated answer of regex):

<a .*?>(.*?)</a>

The important piece of this is the ? after the *. This will make the .* (match all) non-greedy, else you will have one match at most.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.