2
  • www.example.com
  • foo.example.com
  • foo.example.co.uk
  • foo.bar.example.com
  • foo.bar.example.co.uk

I've got these URL's here, and want to always end up with 2 variables:

$domainName = "example"
$domainNameSuffix = ".com" OR ".co.uk"

If I someone could get me from $url being one of the urls, all the way down to $newUrl being close to "example.co.uk", it would be a blessing.

Note that the urls are going to be completely "random", we might end up having "foo.bar.example2.com.au" too, so ... you know... ugh. (asking for the impossible?)

Cheers,

2
  • The title is a bit misleading here. You are parsing domain names, not URLs from what it looks like. Basically, this comes down to looking for a database of TLDs and their associated secondary levels for country codes like uk and au. There's no way to solve this problem without such information. Commented Mar 15, 2011 at 23:51
  • So here is a duplicate: stackoverflow.com/questions/4963202/domain-regex-split - you want to look at RobertPitts solution as alternative. As said, it can be done on a best bet basis. You can't even get reliable results with TLD probing ala dig +all co.uk` Commented Mar 15, 2011 at 23:54

5 Answers 5

3

We had a few questions like this before, but I can't find a good one right now either. The crux is, this cannot be done reliably. You would need a long list of special TLDs (like .uk and .au) which have their own .com/.net level.

But as general approach and simple solution you could use:

preg_match('#([\w-]+)\.(\w+(\.(au|uk))?)\.?$#i', $domain, $m);
list(, $domain, $suffix) = $m;
Sign up to request clarification or add additional context in comments.

9 Comments

Yeh, it surprised me that there wasn't much to be found about this issue - as a relative noob to php (javascript, css & html are my weapons of choice) it seemed rather elementary. .edit: thanks for the reply. Not enough credit yet for an upvote though. 'scuse me.
It will mess up on something like nic.uk. You might actually have to maintain the complete list of valid secondary level domains for something like uk.
This is nice and easy, so +1. I'm probably missing something, but do you need that last optional . (the \.?)?
@myself, I suppose one could argue that www is the domain and nic.uk is the TLD. Really depends on the context on how correct it is.
@konforce I would even ignore that as special case, or blacklist it (?!nic), but an explicit list (\w+|co.uk|net.uk|com.au|org.au) would indeed be most reliable.
|
2

The "domainNameSuffix" is called a top level domain (tld for short), and there is no easy way to extract it.

Every country has it's own tld, and some countries have opted to further subdivide their tld. And since the number of subdomains (my.own.subdomain.example.com) is also variable, there is no easy "one-regexp-fits-all".

As mentioned, you need a list. Fortunately for you there are lists publicly available: http://publicsuffix.org/

1 Comment

Flagged this as the best answer, since it solved my problem most completely. Cheers.
2

You will need to maintain a list of extensions for most accurate results I believe.

$possibleExtensions = array(
    '.com',
    '.co.uk',
    '.com.au'
);

// parse_url() needs a protocol.
$str = 'http://' . $str;

// Use parse_url() to take into account any paths
// or fragments that may end up being there.
$host = parse_url($str, PHP_URL_HOST);

foreach($possibleExtensions as $ext) {

    if (preg_match('/' . preg_quote($ext, '/') . '\Z/', $host)) {
       $domainNameSuffix = $ext;
       // Strip extension     
       $domainName = substr($str, 0, -strlen($ext));
       // Strip off http://           
       $domainName = substr($domainName, 7);
       var_dump($domainName, $domainNameSuffix);
       break;

    }

}

If you never have any paths or extra stuff, you can of course skip the parse_url() and the http:// adding and removal.

It worked for all your tests.

3 Comments

This does not return a key for TLD.
@vicTROLLA parse_url() is the start of what you may want to use, however, especially if they include paths, params and/or fragment.
I ended up using a lot of your concept into my solution (posted as well) - thanks.
0

There isn't a builtin function for this.

A quick google search lead me to http://www.wallpaperama.com/forums/php-function-remove-domain-name-get-tld-splitter-split-t5824.html

This leads me to believe you need to maintain a list of valid TLD's to split URLs on.

1 Comment

instead of maintaining the TLD's your self, why not use a pre maintained one: mxr.mozilla.org/mozilla-central/source/netwerk/dns/…
0

Alright chaps, here's how I solved it, for now. Implementation of more domain names will be done as well, at some point in the future. Don't know what technique I'll use, yet.

# Setting options, single and dual part domain extentions
$v2_onePart = array(
                "com"
                );
$v2_twoPart = array(
                "co.uk",
                "com.au"
                );

$v2_url         = $_SERVER['SERVER_NAME'];      # "example.com"     OR  "example.com.au"
$v2_bits        = explode(".", $v2_url);        # "example", "com"  OR  "example", "com", "au"
$v2_bits        = array_reverse($v2_bits);      # "com", "example"  OR  "au", "com", "example"      (Reversing to eliminate foo.bar.example.com.au problems.)

switch ($v2_bits) {
    case in_array($v2_bits[1] . "." . $v2_bits[0], $v2_twoPart):
        $v2_class   = $v2_bits[2] . " " . $v2_bits[1] . "_" . $v2_bits[0];  # "example com_au"
        break;
    case in_array($v2_bits[0], $v2_onePart):
        $v2_class   = $v2_bits[1] . " " . $v2_bits[0];  # "example com"
        break;
}

1 Comment

What the hell was I thinking.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.