4

I have no idea why this is only applying to the last instance found, not all of them as I would expect. Any help appreciated.

Input string:

<a href="http://www.scirra.com" target="_blank" rel="nofollow">http://www.scirra.com</a><br /><br />
<a href="http://www.scirra.com" target="_blank" rel="nofollow">http://www.scirra.com</a><br /><hr>

Regex:

'SEO scirra links
Dim regEx
Set regEx = New RegExp

' BB code urls
With regEx
    .Pattern = "<a href=\""http://www.scirra.com([^\]]+)\"" target=\""_blank\"" rel=\""nofollow\"">"
    .IgnoreCase = True
    .Global = True
    .MultiLine = True
End With
strMessage = regEx.Replace(strMessage, "<a href=""http://www.scirra.com$1"" target=""_blank"" title=""Some value insert here"">")

set regEx = nothing

Output:

<a href="http://www.scirra.com" target="_blank" rel="nofollow">http://www.scirra.com</a><br /><br />
<a href="http://www.scirra.com" target="_blank" title="Some value insert here">http://www.scirra.com</a><br /><hr>

Can anyone shed light on why it's only adding the title to the last found instance? (I've tested with more, always just applies to last one)

1 Answer 1

7

It is because of this in your regex:

...a.com-->([^\]]+)<--

You try and match everything which is not a ], once or more, in your input. And since there are no ] at all in your input, it swallows everything (yes, even newlines), but has to backtrack in order to satisfy the rest of your regex, which means it backtracks to the last occurrence of " target="_blank" .....

If you want to replace the rel="nofollow" and allow any path behind http://www.scirra.com, you can use this regex instead:

(<a href="http://www\.scirra\.com((/[^/"]+)*/?)" target="_blank" )rel="nofollow">

and replace that with:

$1title="Some value insert here">

Copy/pasting your current code:

Dim regEx
Set regEx = New RegExp

' BB code urls
With regEx
    .Pattern = "(<a href=""http://www\.scirra\.com((/[^""/]+)*/?)"" target=\""_blank\"" )rel=\""nofollow\"">"
    .IgnoreCase = True
    .Global = True
    .MultiLine = True
End With
strMessage = regEx.Replace(strMessage, "$1title=""Some value insert here"">")

Note however that this is quite restrictive in the replaced URLs. For instance, is there the possibility that the target content be something else, or that there are more attributes?

Sign up to request clarification or add additional context in comments.

9 Comments

Doh thanks! It's meant to match all URLs starting with scirra.com, stripping the nofollow off. I'm still struggling to get it to work, http://www.scirra.com(.*) doesn't match either of them, what do I need?
"stripping the nofollow off"? What do you mean?
it's a modification I'm making to the forum where I'm stripping the nofollow attribute off posted links that are internal on the site as well as adding a title attribute
Oh, I see... OK, hold on, I'll cook up that regex
See edited answer. However, see also the last paragraph: your regex may need more work. I am curious however to hear how you came about using ([^]]+) at all?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.