i want to extract url from href of a webpage...for that i m using the regex pattern as "(?(http:[/][/]|www.)([a-z]|[A-Z]|[0-9]|[/.]|[~])*)"
to extract the href from html i used this pattern @"href=\""(?[^\""#]?(?=[\""#]))(?(?#{2}[^#]?#{2})*)(?#[^""]+)?"""
but the problem is...it do not extract urls from the href but urls like "www.seo-sem.com"..and in the result i only get.."www.seo"...after the hyphen it gets truncated...plz could u sugest a better regex pattern to extract url from href..will be thankful to u...
href='url'regex may just be appropriate enough.\2and\3backrefs referencing subdomains and domain respectively would help probably, but by no means would it catch all possible domain names out there.