1

EDIT: I'm not parsing html like the 5 billion other questions that have been posted. This is raw unformatted text that I want to convert into some HTML.

I'm working on a post processing. I need to convert Urls with image endings (jpe?g|png|gif) into image tags, and all other Urls into href links. I have my image replacement correct, however I'm stuck keeping the link replacement from trying to overwrite one another.

I need help with the expression within how to get it to looked for urls without the tags in place from the image replace, or look for urls that do not end in dot jpe?g|png|gif.

public function smartConvertPost($post) {

    /**
     * Match image based urls
     */
    $pattern = '!http://([a-z0-9\-\.\/\_]+\.(?:jpe?g|png|gif))!Ui';
    $replace='<p><img src="http://$1"></p>';
    $postImages = preg_replace($pattern,$replace,$post);

    /**
     * Match url based
     */
    $pattern='/http://([a-z0-9\-\.\/\_]+(?:\S|$))/i';
    $replace='<a href="$1">$1</a>';
    $postUrl = preg_replace($pattern,$replace, $postImages);

return $postUrl;
}

Please note I am not talking about matching tags or html. matching a string like so and converting it to html.

If this was an example post with a Url to a page like http://www.some-website.com/some-page/anything.html and I also put a url to an image http://www.some-website.com/someimage.jpg you would need to regex the two to be a hyperlink and an image. 

Thanks,

6
  • 2
    I'm 90% positive the links to the right under "Related" have this answered 4,5 or maybe 6 times over. Please take a moment to browse the questions listed after you type yours before posting. Commented Mar 21, 2011 at 14:33
  • @Bras 4,5, or maybe 6 ? I'd say many more. Commented Mar 21, 2011 at 14:34
  • @ClementHerreman: More or less just referencing within that haystack. ;-) If you did a google site:stackoverflow.com url to img anchor, you'd be guaranteed hundreds/thousands. ;-) Commented Mar 21, 2011 at 14:35
  • No because they are related to html tags. we are talking strings like this post except it would look more like this: Commented Mar 21, 2011 at 14:40
  • Example with a link-to-someplace.com/anything and then a image link like so link-to-some-images.com/imagename.jpg. You need to regex that and make the Urls href links and the images urls image tags. Commented Mar 21, 2011 at 14:42

3 Answers 3

3

Brad Christie's preg_replace_callback() recommendation is a good one. Here is one possible implementation:

function smartConvertPost($post)
{ // Disclaimer: This "URL plucking" regex is far from ideal.
    $pattern = '!http://[a-z0-9\-._~\!$&\'()*+,;=:/?#[\]@%]+!i';
    $replace='_handle_URL_callback';
    return preg_replace_callback($pattern,$replace, $post);
}

function _handle_URL_callback($matches)
{ // preg_replace_callback() is passed one parameter: $matches.
    if (preg_match('/\.(?:jpe?g|png|gif)(?:$|[?#])/', $matches[0]))
    { // This is an image if path ends in .GIF, .PNG, .JPG or .JPEG.
        return '<p><img src="'. $matches[0] .'"></p>';
    } // Otherwise handle as NOT an image.
    return '<a href="'. $matches[0] .'">'. $matches[0] .'</a>';
}

Note that the regex used to pluck out a URL is not ideal. To do it right is tricky. See the following resources:

Edit: Added ability to recognize image URLs having a query or fragment.

Sign up to request clarification or add additional context in comments.

Comments

1

Since it's the 215247th post on that kind of topic, let's say it again : HTML is too complicated to use regex. Use a parser. See this. Regular expression for parsing links from a webpage?

PS: no offense =).

Edit:

I personnaly often user symfony, and there's a really great parser for what you need : http://fabien.potencier.org/article/42/parsing-xml-documents-with-css-selectors

You can get all images using simple css expression on your html. Give it a try.

2 Comments

None taken, but that's not what I asked. I'm not parsing webpages. I'm parsing raw strings with no tags that I want to convert into tags.
@Levi: This has been done before, that's the basic premise of the responses. It's been done in javascript, php, html, perl, --you name it. Basically, pass a string to the regex engine (I'd recommend preg_replace_callback) and have it return all URL matches (I'll let you decide the pattern). Then, in the callback decide if it's "image-worthy" or "anchor-worthy" and return the newly-formatted result.
0

What about using a marker ?


public function smartConvertPost($post) {
    $MY_MARKER="<MYMARKER>"; // Define the marker here

    /**
     * Match image based urls
     */
    $pattern = '!http://([a-z0-9\-\.\/\_]+\.(?:jpe?g|png|gif))!Ui';
    $replace='<p><img src="$MY_MARKERhttp://$1$MY_MARKER"></p>'; // Use it here...
    $postImages = preg_replace($pattern,$replace,$post);

    /**
     * Match url based
     */
    $pattern='/(?<!$MY_MARKER)http://([a-z0-9\-\.\/\_]+(?:\S|$))(?!$MY_MARKER)/i';//...here
    $replace='<a href="$1">$1</a>';
    $postUrl = preg_replace($pattern,$replace, $postImages);


    /**
     * Remove all markers
     */
    $postUrl = str_replace( $MY_MARKER, '', $postUrl);

    return $postUrl;
}

Try to choose a marker that will have no chance to aapear in the post. HTH

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.