regex pattern matching for http

Question

i want to extract url from href of a webpage...for that i m using the regex pattern as "(?(http:[/][/]|www.)([a-z]|[A-Z]|[0-9]|[/.]|[~])*)"

to extract the href from html i used this pattern @"href=\""(?[^\""#]?(?=[\""#]))(?(?#{2}[^#]?#{2})*)(?#[^""]+)?"""

but the problem is...it do not extract urls from the href but urls like "www.seo-sem.com"..and in the result i only get.."www.seo"...after the hyphen it gets truncated...plz could u sugest a better regex pattern to extract url from href..will be thankful to u...

Don't use regex to parse HTML. Find a simple library like HTMLAgilityPack and use that. — Stephan
– Stephan, Commented May 10, 2010 at 17:55
Even for basic URI matching the regular expression needed is Ugly (yes, capital U). — Joey
– Joey, Commented May 10, 2010 at 17:57
@rebus, well, it's not so much HTML parsing, actually. It doesn't try to do anything with the actual structure of the document. For simply grabbing anything that looks like href='url' regex may just be appropriate enough. — Joey
– Joey, Commented May 10, 2010 at 17:58
(http://|https://)?([\w.-]+)?([\w-]+\.[\w-]+) with \2 and \3 backrefs referencing subdomains and domain respectively would help probably, but by no means would it catch all possible domain names out there. — Davor Lucic
– Davor Lucic, Commented May 10, 2010 at 18:25

Community · Accepted Answer · 2017-05-23 10:32:56Z

4

Use the HTML Agility Pack to parse your HTML. You can query it using Xpath, as it parses the HTML into a XmlDocument like object.

See this for reasons not to parse HTML with regular expressions.

edited May 23, 2017 at 10:32

CommunityBot

11 silver badge

answered May 10, 2010 at 17:57

Oded

501k102 gold badges900 silver badges1k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jaskirat Over a year ago

i resolved the hyphen issue...edited regex..thanks anyways..u all rock..keep it up

Collectives™ on Stack Overflow

regex pattern matching for http

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related