0

I have a custom markup parsing function that has been working very well for many years. I recently discovered a bug that I hadn't noticed before and I haven't been able to fix it. If anyone can help me with this that'd be awesome. So I have a custom built forum and text based MMORPG and every input is sanitized and parsed for bbcode like markup. It'll also parse out URL's and make them into legit links that go to an exit page with a disclaimer that you're leaving the site... So the issue that I'm having is that when I user posts multiple URL's in a text box (let's say \n delimited) it'll only convert every other URL into a link. Here's the parser for URL's:

$markup = preg_replace("/(^|[^=\"\/])\b((\w+:\/\/|www\.)[^\s<]+)" . "((\W+|\b)([\s<]|$))/ei", '"$1<a href=\"out.php?".shortURL("$2")."\" target=\"_blank\">".shortURL("$2")."</a>$4"', $markup);

As you can see it calls a PHP function, but that's not the issue here. Then entire text block is passed into this preg_replace at the same time rather than line by line or any other means.

  1. If there's a simpler way of writing this preg_replace, please let me know
  2. If you can figure out why this is only parsing every other URL, that's my ultimate goal here

Example INPUT:

http://skylnk.co/tRRTnb
http://skylnk.co/hkIJBT
http://skylnk.co/vUMGQo 
http://skylnk.co/USOLfW 
http://skylnk.co/BPlaJl 
http://skylnk.co/tqcPbL
http://skylnk.co/jJTjRs
http://skylnk.co/itmhJs
http://skylnk.co/llUBAR
http://skylnk.co/XDJZxD

Example OUTPUT:

<a href="out.php?http://skylnk.co/tRRTnb" target="_blank">http://skylnk.co/tRRTnb</a>
<br>http://skylnk.co/hkIJBT
<br><a href="out.php?http://skylnk.co/vUMGQo" target="_blank">http://skylnk.co/vUMGQo</a> 
<br>http://skylnk.co/USOLfW 
<br><a href="out.php?http://skylnk.co/BPlaJl" target="_blank">http://skylnk.co/BPlaJl</a> 
<br>http://skylnk.co/tqcPbL
<br><a href="out.php?http://skylnk.co/jJTjRs" target="_blank">http://skylnk.co/jJTjRs</a>
<br>http://skylnk.co/itmhJs
<br><a href="out.php?http://skylnk.co/llUBAR" target="_blank">http://skylnk.co/llUBAR</a>
<br>http://skylnk.co/XDJZxD
<br>
1
  • Can you give a failing test case? Commented May 7, 2013 at 5:55

1 Answer 1

1

e flag in preg_replace is deprecated. You can use preg_replace_callback to access the same functionality.

i flag is useless here, since \w already matches both upper case and lower case, and there is no backreference in your pattern.

I set m flag, which makes the ^ and $ matches the beginning and the end of a line, rather than the beginning and the end of the entire string. This should fix your weird problem of matching every other line.

I also make some of the groups non-capturing (?:pattern) - since the bigger capturing groups have captured the text already.

The code below is not tested. I only tested the regex on regex tester.

preg_replace_callback(
    "/(^|[^=\"\/])\b((?:\w+:\/\/|www\.)[^\s<]+)((?:\W+|\b)(?:[\s<]|$))/m",
    function ($m) {
        return "$m[1]<a href=\"out.php?".shortURL($m[2])."\" target=\"_blank\">".shortURL($m[2])."</a>$m[3]";
    },
    $markup
);
Sign up to request clarification or add additional context in comments.

1 Comment

Wow that's awesome. So many things to learn with Regex. Thanks for the prompt responses and sorry for my delays.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.