0

i have this code in c# to pull links from a web page and wanted to make it smarter in that i want to be able to add small additions in the fuure to exclude links based on 2 criteria.

first i want to exclude certain file extentions found on pages such as links to pdf files or ppt files...

next i want to be able to exclude links on the first part of the url to such things as ftp and images.google... or maps.google.... and mailto...

this is my current code that needs help:

MatchCollection m1 = Regex.Matches(file, @"(?i)(<A[^>]*href\s*=\s*['""](?!mailto|[^'""]*\.(?:pdf|doc|ppt))[^>]*>.*?</A>)", RegexOptions.Singleline);

1 Answer 1

1

Have you considered the Html Agility Pack?

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.