9

Is there a pre-existing function or class for URL normalization in PHP?

Specifically, following the semantic preserving normalization rules laid out in this wikipedia article on URL normalization, (or whatever 'standard' I should be following).

  • Converting the scheme and host to lower case
  • Capitalizing letters in escape sequences
  • Adding trailing / (to directories, not files)
  • Removing the default port
  • Removing dot-segments

Right now, I'm thinking that I'll just use parse_url(), and apply the rules individually, but I'd prefer to avoid reinventing the wheel.

4
  • @yc : stackoverflow.com/search?q=php+seo+url Commented Nov 14, 2010 at 2:16
  • @ajreal no, not <link rel="canonical"...>. Just, normalizing a URL for, for example, requesting data about it from an API, particularly those that require that the URL be hashed, and so if you don't use a normalized URL, you'll get inaccurate or no results. Commented Nov 14, 2010 at 2:36
  • @yc : what is the diff between http://stackoverflow.com and http://stackoverflow.com// ? can u provide more example of url u try to avoid ? Commented Nov 14, 2010 at 4:33
  • 1
    Huge difference! The former hashes (md5) as 57f4dad48e7a4f7cd171c654226feb5a, the latter hashes as 8b34e6ecb6898f39350c1264d6d7aa6c. As far as I'm concerned, they're different URLs, even though a server will resolve the difference. There's a standard, as linked to, that seeks to create normalized URLs. I'm not inventing a concept here; there's a whole wiki article dedicated to the phenomenon. Commented Nov 14, 2010 at 15:10

1 Answer 1

6

The Pear Net_URL2 library looks like it'll do at least part of what you want. It'll remove dot segments, fix capitalization and get rid of the default port:

include("Net/URL2.php");
$url = new Net_URL2('HTTP://example.com:80/a/../b/c');
print $url->getNormalizedURL();

emits:

http://example.com/b/c

I doubt there's a general purpose mechanism for adding trailing slashes to directories because you need a way to map urls to directories which is challenging to do in a generic way. But it's close.

References:

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.