How can I find href attributes that include specific word?
I tried
"href=([?;.:=%-\/\\\'\"]+[a-zA-Z]*[blablabla][?;.:=%-\/\\\'\"]+[a-zA-Z]*$)"
However, it doesn't match anything.
How can I find href attributes that include specific word?
I tried
"href=([?;.:=%-\/\\\'\"]+[a-zA-Z]*[blablabla][?;.:=%-\/\\\'\"]+[a-zA-Z]*$)"
However, it doesn't match anything.
I strongly advise against using regex in this case. I am sure using an HTML parser greatly facilitates the task.
Here is an example how it can be done with HtmlAgilityPack. Install it via Solution > Manage NuGet Packages for Solution... and use
public List<string> HtmlAgilityPackGetHrefIfValueContains(string html, string href_text)
{
var hrefs = new List<string>();
HtmlAgilityPack.HtmlDocument hap;
Uri uriResult;
if (Uri.TryCreate(html, UriKind.Absolute, out uriResult) && uriResult.Scheme == Uri.UriSchemeHttp)
{ // html is a URL
var doc = new HtmlAgilityPack.HtmlWeb();
hap = doc.Load(uriResult.AbsoluteUri);
}
else
{ // html is a string
hap = new HtmlAgilityPack.HtmlDocument();
hap.LoadHtml(html);
}
var nodes = hap.DocumentNode.SelectNodes("//*[@href]");
if (nodes != null)
{
foreach (var node in nodes)
{
foreach (var attribute in node.Attributes)
if (attribute.Name == "href" && attribute.Value.Contains(href_text))
{
hrefs.Add(attribute.Value);
}
}
}
return hrefs;
}
Now, you can pass the html string or URL of the Web page, and get all tags (if you plan to get a hrefs only, use //a[@href] xpath) that contain href_text.