-6

How can I find href attributes that include specific word?

I tried

"href=([?;.:=%-\/\\\'\"]+[a-zA-Z]*[blablabla][?;.:=%-\/\\\'\"]+[a-zA-Z]*$)"

However, it doesn't match anything.

4
  • 1
    man.. just do it yourself.. regex101.com Commented Sep 3, 2015 at 8:54
  • What have you tried? Please post. And as for includes specific word inside the string: what string you mean? "href" attribute value? And definitely it is a work for an HTML parser, not a regex. Commented Sep 3, 2015 at 8:55
  • somthing like this .var t = Regex.Match(input, @"href=yourword");have you googled it!! Commented Sep 3, 2015 at 8:56
  • I enter it "href=([?;.:=%-\/\\\'\"]+[a-zA-Z]*[blablabla][?;.:=%-\/\\\'\"]+[a-zA-Z]*$)", but it doesn't match anithing. Commented Sep 3, 2015 at 9:06

1 Answer 1

3

I strongly advise against using regex in this case. I am sure using an HTML parser greatly facilitates the task.

Here is an example how it can be done with HtmlAgilityPack. Install it via Solution > Manage NuGet Packages for Solution... and use

public List<string> HtmlAgilityPackGetHrefIfValueContains(string html, string href_text)
{
    var hrefs = new List<string>();
    HtmlAgilityPack.HtmlDocument hap;
    Uri uriResult;
    if (Uri.TryCreate(html, UriKind.Absolute, out uriResult) && uriResult.Scheme == Uri.UriSchemeHttp)
    { // html is a URL 
        var doc = new HtmlAgilityPack.HtmlWeb();
        hap = doc.Load(uriResult.AbsoluteUri);
    }
    else
    { // html is a string
        hap = new HtmlAgilityPack.HtmlDocument();
        hap.LoadHtml(html);
    }
    var nodes = hap.DocumentNode.SelectNodes("//*[@href]");
    if (nodes != null)
    {
       foreach (var node in nodes)
       {
           foreach (var attribute in node.Attributes)
               if (attribute.Name == "href" && attribute.Value.Contains(href_text))
               {
                   hrefs.Add(attribute.Value);
               }
        }
    }
    return hrefs;
 }

Now, you can pass the html string or URL of the Web page, and get all tags (if you plan to get a hrefs only, use //a[@href] xpath) that contain href_text.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.