PHP function to make slug (URL string)

Question

I want to have a function to create slugs from Unicode strings, e.g. gen_slug('Andrés Cortez') should return andres-cortez. How should I do that?

I copied your code here: writecodeonline.com/php and it outputs andres. Are you sure your input is exactly "andrés"? — nico
– nico, Commented Jun 2, 2010 at 5:51
in plain php it works. sorry, i forgot to mention that the function is being executed from an ajax function server side. Maybe the problem is happening because of a charset feature? — Andres SK
– Andres SK, Commented Jun 2, 2010 at 5:55
here is a good working solution for cyrilic: stackoverflow.com/questions/7461406/… — d.raev
– d.raev, Commented Jan 5, 2015 at 8:19
Instead of building your own solution, you can use an existing library like github.com/cocur/slugify or github.com/ausi/slug-generator — ausi
– ausi, Commented Oct 30, 2017 at 22:06

Maerlyn · Accepted Answer · 2021-05-07 04:23:38Z

570

Instead of a lengthy replace, try this one:

public static function slugify($text, string $divider = '-')
{
  // replace non letter or digits by divider
  $text = preg_replace('~[^\pL\d]+~u', $divider, $text);

  // transliterate
  $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);

  // remove unwanted characters
  $text = preg_replace('~[^-\w]+~', '', $text);

  // trim
  $text = trim($text, $divider);

  // remove duplicate divider
  $text = preg_replace('~-+~', $divider, $text);

  // lowercase
  $text = strtolower($text);

  if (empty($text)) {
    return 'n-a';
  }

  return $text;
}

This was based off the one in Symfony's Jobeet tutorial.

edited May 7, 2021 at 4:23

answered Jun 2, 2010 at 7:57

Maerlyn

34.2k18 gold badges99 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

22 Comments

Maerlyn Over a year ago

No, the first preg_replace wipes everything that's not a character or a digit. Note the ^ just after the opening bracket - it reverses the match.

Kendall Hopkins Over a year ago

iconv will not convert correctly if $text contains characters that don't have ascii equivalent. For example iconv('utf-8', 'us-ascii//TRANSLIT', "EFI收购Cretaprint") will return "EFI" and leak a warning.

groovenectar Over a year ago

@Maerlyn and andufo, I think there is an extra "\" though in the first regex, correct? Should be '~[^\pL\d]+~u' ?

rybo111 Over a year ago

$text = trim($text, '-'); should be at the end, otherwise Foo 收 becomes foo-. Also, Foo 收 Bar becomes foo--bar (the repeated - seems redundant).

Romain Bruckert Over a year ago

No no no no this does not work. The first expression DOES replace all non letter characters. Should not be accepted.

|

Chuck Le Butt · Accepted Answer · 2022-06-26 18:23:38Z

73

Update

Since this answer is getting some attention, I'm adding some explanation.

The solution provided will essentially replace everything except A-Z, a-z, 0-9, & - (hyphen) with - (hyphen). So, it won't work properly with other unicode characters (which are valid characters for a URL slug/string). A common scenario is when the input string contains non-English characters.

Only use this solution if you're confident that the input string won't have unicode characters which you might want to be a part of output/slug.

Eg. "नारी शक्ति" will become "----------" (all hyphens) instead of "नारी-शक्ति" (valid URL slug).

Answer

$slug = strtolower(trim(preg_replace('/[^A-Za-z0-9-]+/', '-', $string)));

edited Jun 26, 2022 at 18:23

Chuck Le Butt

49k62 gold badges213 silver badges299 bronze badges

answered Dec 12, 2015 at 20:18

TheKalpit

1,5161 gold badge14 silver badges27 bronze badges

9 Comments

Romain Bruckert Over a year ago

Actually it does. This thread is sooooo strange... the accepted answer does not work, all the other kinda of do...

AGB Over a year ago

It provides an answer to the question asked in the title which is what led me here and to this answer, which was perfect for my needs.

Jacob David C. Cunningham Over a year ago

Just adding to this, not 100% tested, I found if you initially replaced all spaces with a dash, then used this function to remove any other characters to replace them with an empty value eg. ''

Julien Over a year ago

strtolower(trim(preg_replace('/[^A-Za-z0-9-]+/', '-', "Étienne"))) returns "-tienne" instead of "etienne", so it does not works with accented characters.

user719662 Over a year ago

slugs generated this way are in no way SEO-friendly or user-friendly. Also, they generate a lot of collisions in many languages, many more than proper transliteration would cause.

|

Casimir et Hippolyte · Accepted Answer · 2022-05-10 19:23:55Z

59

If you have intl extension installed, you can use Transliterator::transliterate function to create a slug easily.

$string = 'Namnet på bildtävlingen';

$rules = <<<'RULES'
    :: Any-Latin;
    :: NFD;
    :: [:Nonspacing Mark:] Remove;
    :: NFC;
    :: [^-[:^Punctuation:]] Remove;
    :: Lower();
    [:^L:] { [-] > ;
    [-] } [:^L:] > ;
    [-[:Separator:]]+ > '-';
RULES;

$slug = \Transliterator::createFromRules($rules)
    ->transliterate( $string );

echo $slug; // namnet-pa-bildtavlingen

demo

Note that this solution works whatever the alphabet and is highly flexible.

edited May 10, 2022 at 19:23

Casimir et Hippolyte

90k5 gold badges102 silver badges131 bronze badges

answered Nov 11, 2012 at 14:19

hdogan

9436 silver badges6 bronze badges

6 Comments

Jessedc Over a year ago

For those arriving at this post years later the intl extension is bundled with PHP since 5.3.0. php.net/manual/en/intl.requirements.php

undefinedman Over a year ago

this however removes dash (-), e.g. $string = "My-string hello" returns "mystring-hello"

Georgy Ivanov Over a year ago

To learn about available rules, explore this answer: stackoverflow.com/a/17015063/6850820

Casimir et Hippolyte Over a year ago

The only good answer for this question.

Casimir et Hippolyte Over a year ago

@undefinedman: this can easily be solved replacing[:Punctuation:] with [^-[:^Punctuation:]]. A more waterproof solution: 3v4l.org/FAJAW

|

Imran Omar Bukhsh · Accepted Answer · 2013-07-29 01:53:12Z

29

Note: I have taken this from wordpress and it works!!

Use it like this:

echo sanitize('testing this link');

Code

//taken from wordpress
function utf8_uri_encode( $utf8_string, $length = 0 ) {
    $unicode = '';
    $values = array();
    $num_octets = 1;
    $unicode_length = 0;

    $string_length = strlen( $utf8_string );
    for ($i = 0; $i < $string_length; $i++ ) {

        $value = ord( $utf8_string[ $i ] );

        if ( $value < 128 ) {
            if ( $length && ( $unicode_length >= $length ) )
                break;
            $unicode .= chr($value);
            $unicode_length++;
        } else {
            if ( count( $values ) == 0 ) $num_octets = ( $value < 224 ) ? 2 : 3;

            $values[] = $value;

            if ( $length && ( $unicode_length + ($num_octets * 3) ) > $length )
                break;
            if ( count( $values ) == $num_octets ) {
                if ($num_octets == 3) {
                    $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]) . '%' . dechex($values[2]);
                    $unicode_length += 9;
                } else {
                    $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]);
                    $unicode_length += 6;
                }

                $values = array();
                $num_octets = 1;
            }
        }
    }

    return $unicode;
}

//taken from wordpress
function seems_utf8($str) {
    $length = strlen($str);
    for ($i=0; $i < $length; $i++) {
        $c = ord($str[$i]);
        if ($c < 0x80) $n = 0; # 0bbbbbbb
        elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
        elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
        elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
        elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
        elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
        else return false; # Does not match any model
        for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
            if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
                return false;
        }
    }
    return true;
}

//function sanitize_title_with_dashes taken from wordpress
function sanitize($title) {
    $title = strip_tags($title);
    // Preserve escaped octets.
    $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
    // Remove percent signs that are not part of an octet.
    $title = str_replace('%', '', $title);
    // Restore octets.
    $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);

    if (seems_utf8($title)) {
        if (function_exists('mb_strtolower')) {
            $title = mb_strtolower($title, 'UTF-8');
        }
        $title = utf8_uri_encode($title, 200);
    }

    $title = strtolower($title);
    $title = preg_replace('/&.+?;/', '', $title); // kill entities
    $title = str_replace('.', '-', $title);
    $title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
    $title = preg_replace('/\s+/', '-', $title);
    $title = preg_replace('|-+|', '-', $title);
    $title = trim($title, '-');

    return $title;
}

edited Jul 29, 2013 at 1:53

answered Apr 28, 2012 at 3:30

Imran Omar Bukhsh

8,08113 gold badges64 silver badges82 bronze badges

3 Comments

Simon E. Over a year ago

In WordPress 3.9.1 (which I'm using) I have to call sanitize_title_with_dashes($string, null, 'save') (note the extra parameters), otherwise you get some messy character codes like telstra%e2%80%99s-%e2%80%98all-roles-flex%e2%80%99. Not very pretty. :-(

iDev247 Over a year ago

See the related answer on this page from @czerasz. Links to latest: formatting.php also functions.php

rybo111 Over a year ago

sanitize is a strange, forgettable function name to generate a slug.

Vazgen Manukyan · Accepted Answer · 2019-08-21 13:39:13Z

16

It is always a good idea to use existing solutions that are being supported by a lot of high-level developers. The most popular one is https://github.com/cocur/slugify. First of all, it supports more than one language, and it is being updated.

If you do not want to use the whole package, you can copy the part that you need.

edited Aug 21, 2019 at 13:39

answered Apr 25, 2016 at 20:52

Vazgen Manukyan

1,5201 gold badge13 silver badges19 bronze badges

2 Comments

Sagive Over a year ago

its overkill... we just need a simple, lowercase, no spaces / weird characters function.

poring91 Over a year ago

I would rather use the reusable library and saves a hundred or thousand lines of coffee pasta and let others do the maintenance. PHP libs are always works just fine. This is the era of LEGO Oriented Programmer yo.

Baptiste Gaillard · Accepted Answer · 2012-11-27 16:21:10Z

12

Here is an other one, for example " Title with strange characters ééé A X Z" becomes "title-with-strange-characters-eee-a-x-z".

/**
 * Function used to create a slug associated to an "ugly" string.
 *
 * @param string $string the string to transform.
 *
 * @return string the resulting slug.
 */
public static function createSlug($string) {

    $table = array(
            'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c',
            'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
            'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
            'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
            'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
            'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
            'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
            'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r', '/' => '-', ' ' => '-'
    );

    // -- Remove duplicated spaces
    $stripped = preg_replace(array('/\s{2,}/', '/[\t\n]/'), ' ', $string);

    // -- Returns the slug
    return strtolower(strtr($string, $table));


}

answered Nov 27, 2012 at 16:21

Baptiste Gaillard

3925 silver badges17 bronze badges

1 Comment

s3c Over a year ago

Renamed $string and $stripped to $text and added $text = preg_replace('~[^\pL\d]+~u', '-', $text); before return. Now it works perfectly. Thank you.

Nady Shalaby · Accepted Answer · 2016-06-28 02:31:17Z

public static function slugify ($text) {

    $replace = [
        '&lt;' => '', '&gt;' => '', '&#039;' => '', '&amp;' => '',
        '&quot;' => '', 'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä'=> 'Ae',
        '&Auml;' => 'A', 'Å' => 'A', 'Ā' => 'A', 'Ą' => 'A', 'Ă' => 'A', 'Æ' => 'Ae',
        'Ç' => 'C', 'Ć' => 'C', 'Č' => 'C', 'Ĉ' => 'C', 'Ċ' => 'C', 'Ď' => 'D', 'Đ' => 'D',
        'Ð' => 'D', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'Ē' => 'E',
        'Ę' => 'E', 'Ě' => 'E', 'Ĕ' => 'E', 'Ė' => 'E', 'Ĝ' => 'G', 'Ğ' => 'G',
        'Ġ' => 'G', 'Ģ' => 'G', 'Ĥ' => 'H', 'Ħ' => 'H', 'Ì' => 'I', 'Í' => 'I',
        'Î' => 'I', 'Ï' => 'I', 'Ī' => 'I', 'Ĩ' => 'I', 'Ĭ' => 'I', 'Į' => 'I',
        'İ' => 'I', 'Ĳ' => 'IJ', 'Ĵ' => 'J', 'Ķ' => 'K', 'Ł' => 'K', 'Ľ' => 'K',
        'Ĺ' => 'K', 'Ļ' => 'K', 'Ŀ' => 'K', 'Ñ' => 'N', 'Ń' => 'N', 'Ň' => 'N',
        'Ņ' => 'N', 'Ŋ' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O',
        'Ö' => 'Oe', '&Ouml;' => 'Oe', 'Ø' => 'O', 'Ō' => 'O', 'Ő' => 'O', 'Ŏ' => 'O',
        'Œ' => 'OE', 'Ŕ' => 'R', 'Ř' => 'R', 'Ŗ' => 'R', 'Ś' => 'S', 'Š' => 'S',
        'Ş' => 'S', 'Ŝ' => 'S', 'Ș' => 'S', 'Ť' => 'T', 'Ţ' => 'T', 'Ŧ' => 'T',
        'Ț' => 'T', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'Ue', 'Ū' => 'U',
        '&Uuml;' => 'Ue', 'Ů' => 'U', 'Ű' => 'U', 'Ŭ' => 'U', 'Ũ' => 'U', 'Ų' => 'U',
        'Ŵ' => 'W', 'Ý' => 'Y', 'Ŷ' => 'Y', 'Ÿ' => 'Y', 'Ź' => 'Z', 'Ž' => 'Z',
        'Ż' => 'Z', 'Þ' => 'T', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a',
        'ä' => 'ae', '&auml;' => 'ae', 'å' => 'a', 'ā' => 'a', 'ą' => 'a', 'ă' => 'a',
        'æ' => 'ae', 'ç' => 'c', 'ć' => 'c', 'č' => 'c', 'ĉ' => 'c', 'ċ' => 'c',
        'ď' => 'd', 'đ' => 'd', 'ð' => 'd', 'è' => 'e', 'é' => 'e', 'ê' => 'e',
        'ë' => 'e', 'ē' => 'e', 'ę' => 'e', 'ě' => 'e', 'ĕ' => 'e', 'ė' => 'e',
        'ƒ' => 'f', 'ĝ' => 'g', 'ğ' => 'g', 'ġ' => 'g', 'ģ' => 'g', 'ĥ' => 'h',
        'ħ' => 'h', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ī' => 'i',
        'ĩ' => 'i', 'ĭ' => 'i', 'į' => 'i', 'ı' => 'i', 'ĳ' => 'ij', 'ĵ' => 'j',
        'ķ' => 'k', 'ĸ' => 'k', 'ł' => 'l', 'ľ' => 'l', 'ĺ' => 'l', 'ļ' => 'l',
        'ŀ' => 'l', 'ñ' => 'n', 'ń' => 'n', 'ň' => 'n', 'ņ' => 'n', 'ŉ' => 'n',
        'ŋ' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'oe',
        '&ouml;' => 'oe', 'ø' => 'o', 'ō' => 'o', 'ő' => 'o', 'ŏ' => 'o', 'œ' => 'oe',
        'ŕ' => 'r', 'ř' => 'r', 'ŗ' => 'r', 'š' => 's', 'ù' => 'u', 'ú' => 'u',
        'û' => 'u', 'ü' => 'ue', 'ū' => 'u', '&uuml;' => 'ue', 'ů' => 'u', 'ű' => 'u',
        'ŭ' => 'u', 'ũ' => 'u', 'ų' => 'u', 'ŵ' => 'w', 'ý' => 'y', 'ÿ' => 'y',
        'ŷ' => 'y', 'ž' => 'z', 'ż' => 'z', 'ź' => 'z', 'þ' => 't', 'ß' => 'ss',
        'ſ' => 'ss', 'ый' => 'iy', 'А' => 'A', 'Б' => 'B', 'В' => 'V', 'Г' => 'G',
        'Д' => 'D', 'Е' => 'E', 'Ё' => 'YO', 'Ж' => 'ZH', 'З' => 'Z', 'И' => 'I',
        'Й' => 'Y', 'К' => 'K', 'Л' => 'L', 'М' => 'M', 'Н' => 'N', 'О' => 'O',
        'П' => 'P', 'Р' => 'R', 'С' => 'S', 'Т' => 'T', 'У' => 'U', 'Ф' => 'F',
        'Х' => 'H', 'Ц' => 'C', 'Ч' => 'CH', 'Ш' => 'SH', 'Щ' => 'SCH', 'Ъ' => '',
        'Ы' => 'Y', 'Ь' => '', 'Э' => 'E', 'Ю' => 'YU', 'Я' => 'YA', 'а' => 'a',
        'б' => 'b', 'в' => 'v', 'г' => 'g', 'д' => 'd', 'е' => 'e', 'ё' => 'yo',
        'ж' => 'zh', 'з' => 'z', 'и' => 'i', 'й' => 'y', 'к' => 'k', 'л' => 'l',
        'м' => 'm', 'н' => 'n', 'о' => 'o', 'п' => 'p', 'р' => 'r', 'с' => 's',
        'т' => 't', 'у' => 'u', 'ф' => 'f', 'х' => 'h', 'ц' => 'c', 'ч' => 'ch',
        'ш' => 'sh', 'щ' => 'sch', 'ъ' => '', 'ы' => 'y', 'ь' => '', 'э' => 'e',
        'ю' => 'yu', 'я' => 'ya'
    ];

    // make a human readable string
    $text = strtr($text, $replace);

    // replace non letter or digits by -
    $text = preg_replace('~[^\\pL\d.]+~u', '-', $text);

    // trim
    $text = trim($text, '-');

    // remove unwanted characters
    $text = preg_replace('~[^-\w.]+~', '', $text);

    $text = strtolower($text);

    return $text;
}

Hi Nady, welcome to SO. Code-only answers are discouraged here as they don't teach others how to code. Could you edit your post to explain what your code sample does and how it answers the question? Thanks.
The code is self explanatory through comments. I have no extra information to expose here. So you can know what i am trying to do just by reading doc blocks

s3c · Accepted Answer · 2021-06-04 14:04:35Z

There are already many answers here, so I almost don't want to add another one, but none of the functions did everything I needed.

The best basis for me was function number 3 where their speed was compared. I added/fixed some replacements so

' is just deleted,
. is replaced by -,
α is replaced by a,
ẞ is replaced by b,
Ł (and similar) is replaced by L instead of K, and
€ and $ signs are replaced with eur and usd respectively (add more on necessity).

Optionally you can add '&' => '-and-', but SEO advises against usage of conjunctions (#8), so I left it out for my use-case. (this function doesn't strip existing ands and ors from the string though)

I also added a line of code to fix double dash in this weird string I came up with, as well as an optional parameter to limit slug's length.

Code

<?php
function slugify($text, $length = null)
{
    $replacements = [
        '<' => '', '>' => '', '-' => ' ', '&' => '', '"' => '', 'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä' => 'Ae', 'Ä' => 'A', 'Å' => 'A', 'Ā' => 'A', 'Ą' => 'A', 'Ă' => 'A', 'Æ' => 'Ae', 'Ç' => 'C', "'" => '', 'Ć' => 'C', 'Č' => 'C', 'Ĉ' => 'C', 'Ċ' => 'C', 'Ď' => 'D', 'Đ' => 'D', 'Ð' => 'D', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'Ē' => 'E', 'Ę' => 'E', 'Ě' => 'E', 'Ĕ' => 'E', 'Ė' => 'E', 'Ĝ' => 'G', 'Ğ' => 'G', 'Ġ' => 'G', 'Ģ' => 'G', 'Ĥ' => 'H', 'Ħ' => 'H', 'Ì' => 'I', 'Í' => 'I', 'Î' => 'I', 'Ï' => 'I', 'Ī' => 'I', 'Ĩ' => 'I', 'Ĭ' => 'I', 'Į' => 'I', 'İ' => 'I', 'Ĳ' => 'IJ', 'Ĵ' => 'J', 'Ķ' => 'K', 'Ł' => 'L', 'Ľ' => 'L', 'Ĺ' => 'L', 'Ļ' => 'L', 'Ŀ' => 'L', 'Ñ' => 'N', 'Ń' => 'N', 'Ň' => 'N', 'Ņ' => 'N', 'Ŋ' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O', 'Ö' => 'Oe', 'Ö' => 'Oe', 'Ø' => 'O', 'Ō' => 'O', 'Ő' => 'O', 'Ŏ' => 'O', 'Œ' => 'OE', 'Ŕ' => 'R', 'Ř' => 'R', 'Ŗ' => 'R', 'Ś' => 'S', 'Š' => 'S', 'Ş' => 'S', 'Ŝ' => 'S', 'Ș' => 'S', 'Ť' => 'T', 'Ţ' => 'T', 'Ŧ' => 'T', 'Ț' => 'T', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'Ue', 'Ū' => 'U', 'Ü' => 'Ue', 'Ů' => 'U', 'Ű' => 'U', 'Ŭ' => 'U', 'Ũ' => 'U', 'Ų' => 'U', 'Ŵ' => 'W', 'Ý' => 'Y', 'Ŷ' => 'Y', 'Ÿ' => 'Y', 'Ź' => 'Z', 'Ž' => 'Z', 'Ż' => 'Z', 'Þ' => 'T', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a', 'ä' => 'ae', 'ä' => 'ae', 'å' => 'a', 'ā' => 'a', 'ą' => 'a', 'ă' => 'a', 'æ' => 'ae', 'ç' => 'c', 'ć' => 'c', 'č' => 'c', 'ĉ' => 'c', 'ċ' => 'c', 'ď' => 'd', 'đ' => 'd', 'ð' => 'd', 'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e', 'ē' => 'e', 'ę' => 'e', 'ě' => 'e', 'ĕ' => 'e', 'ė' => 'e', 'ƒ' => 'f', 'ĝ' => 'g', 'ğ' => 'g', 'ġ' => 'g', 'ģ' => 'g', 'ĥ' => 'h', 'ħ' => 'h', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ī' => 'i', 'ĩ' => 'i', 'ĭ' => 'i', 'į' => 'i', 'ı' => 'i', 'ĳ' => 'ij', 'ĵ' => 'j', 'ķ' => 'k', 'ĸ' => 'k', 'ł' => 'l', 'ľ' => 'l', 'ĺ' => 'l', 'ļ' => 'l', 'ŀ' => 'l', 'ñ' => 'n', 'ń' => 'n', 'ň' => 'n', 'ņ' => 'n', 'ŉ' => 'n', 'ŋ' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'oe', 'ö' => 'oe', 'ø' => 'o', 'ō' => 'o', 'ő' => 'o', 'ŏ' => 'o', 'œ' => 'oe', 'ŕ' => 'r', 'ř' => 'r', 'ŗ' => 'r', 'š' => 's', 'ś' => 's', 'ù' => 'u', 'ú' => 'u', 'û' => 'u', 'ü' => 'ue', 'ū' => 'u', 'ü' => 'ue', 'ů' => 'u', 'ű' => 'u', 'ŭ' => 'u', 'ũ' => 'u', 'ų' => 'u', 'ŵ' => 'w', 'ý' => 'y', 'ÿ' => 'y', 'ŷ' => 'y', 'ž' => 'z', 'ż' => 'z', 'ź' => 'z', 'þ' => 't', 'α' => 'a', 'ß' => 'ss', 'ẞ' => 'b', 'ſ' => 'ss', 'ый' => 'iy', 'А' => 'A', 'Б' => 'B', 'В' => 'V', 'Г' => 'G', 'Д' => 'D', 'Е' => 'E', 'Ё' => 'YO', 'Ж' => 'ZH', 'З' => 'Z', 'И' => 'I', 'Й' => 'Y', 'К' => 'K', 'Л' => 'L', 'М' => 'M', 'Н' => 'N', 'О' => 'O', 'П' => 'P', 'Р' => 'R', 'С' => 'S', 'Т' => 'T', 'У' => 'U', 'Ф' => 'F', 'Х' => 'H', 'Ц' => 'C', 'Ч' => 'CH', 'Ш' => 'SH', 'Щ' => 'SCH', 'Ъ' => '', 'Ы' => 'Y', 'Ь' => '', 'Э' => 'E', 'Ю' => 'YU', 'Я' => 'YA', 'а' => 'a', 'б' => 'b', 'в' => 'v', 'г' => 'g', 'д' => 'd', 'е' => 'e', 'ё' => 'yo', 'ж' => 'zh', 'з' => 'z', 'и' => 'i', 'й' => 'y', 'к' => 'k', 'л' => 'l', 'м' => 'm', 'н' => 'n', 'о' => 'o', 'п' => 'p', 'р' => 'r', 'с' => 's', 'т' => 't', 'у' => 'u', 'ф' => 'f', 'х' => 'h', 'ц' => 'c', 'ч' => 'ch', 'ш' => 'sh', 'щ' => 'sch', 'ъ' => '', 'ы' => 'y', 'ь' => '', 'э' => 'e', 'ю' => 'yu', 'я' => 'ya', '.' => '-', '€' => '-eur-', '$' => '-usd-'
    ];
    // Replace non-ascii characters
    $text = strtr($text, $replacements);
    // Replace non letter or digits with "-"
    $text = preg_replace('~[^\pL\d.]+~u', '-', $text);
    // Replace unwanted characters with "-"
    $text = preg_replace('~[^-\w.]+~', '-', $text);
    // Trim "-"
    $text = trim($text, '-');
    // Remove duplicate "-"
    $text = preg_replace('~-+~', '-', $text);
    // Convert to lowercase
    $text = strtolower($text);
    // Limit length
    if (isset($length) && $length < strlen($text))
        $text = rtrim(substr($text, 0, $length), '-');

    return $text;
}
$text = "--- You can't misuse me! Or can-ya? ČĆŽŠĐ÷×ß¤_.,:;-!\"#$%&/()=?*~ˇ^˘°˛`˙´˝¨¸¸¨Łł€\|@{}[] ¿ Àñdréß l'affreux ğarçon & nøël en forêt ! Andrés Cortez EFI收购Cretaprint Étienne";
echo "text\n$text\n\nslug\n".slugify($text);

Output

text
--- You can't misuse me! Or can-ya? ČĆŽŠĐ÷×ß¤_.,:;-!"#$%&/()=?*~ˇ^˘°˛`˙´˝¨¸¸¨Łł€\|@{}[] ¿ Àñdréß l'affreux ğarçon & nøël en forêt ! Andrés Cortez EFI收购Cretaprint Étienne

slug
you-cant-misuse-me-or-can-ya-cczsd-ss-usd-ll-eur-andress-laffreux-garcon-noel-en-foret-andres-cortez-efi-cretaprint-etienne

Note

It also works for OP's case for converting 'Andrés Cortez' to 'andres-cortez' and all other examples I have found in this thread, except this character which is beyond me: 𐐷.

I'll be thrilled to know about bugs you found (hopefully accompanied with suggestions).

czerasz · Accepted Answer · 2014-10-09 19:27:39Z

An updated version of @Imran Omar Bukhsh code (from the latest Wordpress (4.0) branch):

<?php

// Add methods to slugify taken from Wordpress:
// - https://github.com/WordPress/WordPress/blob/master/wp-includes/formatting.php 
// - https://github.com/WordPress/WordPress/blob/master/wp-includes/functions.php

/**
 * Set the mbstring internal encoding to a binary safe encoding when func_overload
 * is enabled.
 *
 * When mbstring.func_overload is in use for multi-byte encodings, the results from
 * strlen() and similar functions respect the utf8 characters, causing binary data
 * to return incorrect lengths.
 *
 * This function overrides the mbstring encoding to a binary-safe encoding, and
 * resets it to the users expected encoding afterwards through the
 * `reset_mbstring_encoding` function.
 *
 * It is safe to recursively call this function, however each
 * `mbstring_binary_safe_encoding()` call must be followed up with an equal number
 * of `reset_mbstring_encoding()` calls.
 *
 * @since 3.7.0
 *
 * @see reset_mbstring_encoding()
 *
 * @param bool $reset Optional. Whether to reset the encoding back to a previously-set encoding.
 *                    Default false.
 */
function mbstring_binary_safe_encoding( $reset = false ) {
  static $encodings = array();
  static $overloaded = null;

  if ( is_null( $overloaded ) )
    $overloaded = function_exists( 'mb_internal_encoding' ) && ( ini_get( 'mbstring.func_overload' ) & 2 );

  if ( false === $overloaded )
    return;

  if ( ! $reset ) {
    $encoding = mb_internal_encoding();
    array_push( $encodings, $encoding );
    mb_internal_encoding( 'ISO-8859-1' );
  }

  if ( $reset && $encodings ) {
    $encoding = array_pop( $encodings );
    mb_internal_encoding( $encoding );
  }
}

/**
 * Reset the mbstring internal encoding to a users previously set encoding.
 *
 * @see mbstring_binary_safe_encoding()
 *
 * @since 3.7.0
 */
function reset_mbstring_encoding() {
  mbstring_binary_safe_encoding( true );
}


/**
 * Checks to see if a string is utf8 encoded.
 *
 * NOTE: This function checks for 5-Byte sequences, UTF8
 *       has Bytes Sequences with a maximum length of 4.
 *
 * @author bmorel at ssi dot fr (modified)
 * @since 1.2.1
 *
 * @param string $str The string to be checked
 * @return bool True if $str fits a UTF-8 model, false otherwise.
 */
function seems_utf8($str) {
  mbstring_binary_safe_encoding();
  $length = strlen($str);
  reset_mbstring_encoding();
  for ($i=0; $i < $length; $i++) {
    $c = ord($str[$i]);
    if ($c < 0x80) $n = 0; # 0bbbbbbb
    elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
    elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
    elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
    elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
    elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
    else return false; # Does not match any model
    for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
      if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
        return false;
    }
  }
  return true;
}


/**
 * Encode the Unicode values to be used in the URI.
 *
 * @since 1.5.0
 *
 * @param string $utf8_string
 * @param int $length Max length of the string
 * @return string String with Unicode encoded for URI.
 */
function utf8_uri_encode( $utf8_string, $length = 0 ) {
  $unicode = '';
  $values = array();
  $num_octets = 1;
  $unicode_length = 0;

  mbstring_binary_safe_encoding();
  $string_length = strlen( $utf8_string );
  reset_mbstring_encoding();

  for ($i = 0; $i < $string_length; $i++ ) {

    $value = ord( $utf8_string[ $i ] );

    if ( $value < 128 ) {
      if ( $length && ( $unicode_length >= $length ) )
        break;
      $unicode .= chr($value);
      $unicode_length++;
    } else {
      if ( count( $values ) == 0 ) $num_octets = ( $value < 224 ) ? 2 : 3;

      $values[] = $value;

      if ( $length && ( $unicode_length + ($num_octets * 3) ) > $length )
        break;
      if ( count( $values ) == $num_octets ) {
        if ($num_octets == 3) {
          $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]) . '%' . dechex($values[2]);
          $unicode_length += 9;
        } else {
          $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]);
          $unicode_length += 6;
        }

        $values = array();
        $num_octets = 1;
      }
    }
  }

  return $unicode;
}


/**
 * Sanitizes a title, replacing whitespace and a few other characters with dashes.
 *
 * Limits the output to alphanumeric characters, underscore (_) and dash (-).
 * Whitespace becomes a dash.
 *
 * @since 1.2.0
 *
 * @param string $title The title to be sanitized.
 * @param string $raw_title Optional. Not used.
 * @param string $context Optional. The operation for which the string is sanitized.
 * @return string The sanitized title.
 */
function sanitize_title_with_dashes( $title, $raw_title = '', $context = 'display' ) {
  $title = strip_tags($title);
  // Preserve escaped octets.
  $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
  // Remove percent signs that are not part of an octet.
  $title = str_replace('%', '', $title);
  // Restore octets.
  $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);

  if (seems_utf8($title)) {
    if (function_exists('mb_strtolower')) {
      $title = mb_strtolower($title, 'UTF-8');
    }
    $title = utf8_uri_encode($title, 200);
  }

  $title = strtolower($title);
  $title = preg_replace('/&.+?;/', '', $title); // kill entities
  $title = str_replace('.', '-', $title);

  if ( 'save' == $context ) {
    // Convert nbsp, ndash and mdash to hyphens
    $title = str_replace( array( '%c2%a0', '%e2%80%93', '%e2%80%94' ), '-', $title );

    // Strip these characters entirely
    $title = str_replace( array(
      // iexcl and iquest
      '%c2%a1', '%c2%bf',
      // angle quotes
      '%c2%ab', '%c2%bb', '%e2%80%b9', '%e2%80%ba',
      // curly quotes
      '%e2%80%98', '%e2%80%99', '%e2%80%9c', '%e2%80%9d',
      '%e2%80%9a', '%e2%80%9b', '%e2%80%9e', '%e2%80%9f',
      // copy, reg, deg, hellip and trade
      '%c2%a9', '%c2%ae', '%c2%b0', '%e2%80%a6', '%e2%84%a2',
      // acute accents
      '%c2%b4', '%cb%8a', '%cc%81', '%cd%81',
      // grave accent, macron, caron
      '%cc%80', '%cc%84', '%cc%8c',
    ), '', $title );

    // Convert times to x
    $title = str_replace( '%c3%97', 'x', $title );
  }

  $title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
  $title = preg_replace('/\s+/', '-', $title);
  $title = preg_replace('|-+|', '-', $title);
  $title = trim($title, '-');

  return $title;
}

$title = '#PFW Alexander McQueen Spring/Summer 2015';
echo "title -> slug: \n". $title ." -> ". sanitize_title_with_dashes($title);
echo "\n\n";
$title = '«GQ»: Elyas M\'Barek gehört zu Männern des Jahres';
echo "title -> slug: \n". $title ." -> ". sanitize_title_with_dashes($title);

View online example.

Second example fails for me, returns %c2%abgq%c2%bb-elyas-mbarek-geh%c3%b6rt-zu-m%c3%a4nnern-des-jahres

Entendu · Accepted Answer · 2010-06-02 06:39:57Z

7

Don't use preg_replace for this. There's a php function built just for the task: strtr() http://php.net/manual/en/function.strtr.php

Taken from the comments in the above link (and I tested it myself; it works:

function normalize ($string) {
    $table = array(
        'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c',
        'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
        'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
        'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
        'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
        'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
        'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
        'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r',
    );

    return strtr($string, $table);
}

answered Jun 2, 2010 at 6:39

Entendu

1,24510 silver badges12 bronze badges

3 Comments

Andres SK Over a year ago

the translation is ok, the problem is that im using the function through an ajax call, and it doesn't work in there =(

Silvio Delgado Over a year ago

@andufo You must set charset in the ajax request.

mickmackusa Over a year ago

This is only a partial answer to how to slugify a string.

Community · Accepted Answer · 2020-06-20 09:12:55Z

I didn't know which one to use so I made a quick bench on phptester.net

<?php

// First test
// https://stackoverflow.com/a/42740874/10232729
function slugify(STRING $string, STRING $separator = '-'){
    
    $accents_regex = '~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i';
    $special_cases = [ '&' => 'and', "'" => ''];
    $string = mb_strtolower( trim( $string ), 'UTF-8' );
    $string = str_replace( array_keys($special_cases), array_values( $special_cases), $string );
    $string = preg_replace( $accents_regex, '$1', htmlentities( $string, ENT_QUOTES, 'UTF-8' ) );
    $string = preg_replace('/[^a-z0-9]/u', $separator, $string);
    
    return preg_replace('/['.$separator.']+/u', $separator, $string);
}

// Second test
// https://stackoverflow.com/a/13331948/10232729
function slug(STRING $string, STRING $separator = '-'){
    
    $string = transliterator_transliterate('Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove; Lower();', $string);
        
    return str_replace(' ', $separator, $string);;
}

// Third test - My choice
// https://stackoverflow.com/a/38066136/10232729
function slugbis($text){

    $replace = [
        '<' => '', '>' => '', '-' => ' ', '&' => '',
        '"' => '', 'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä'=> 'Ae',
        'Ä' => 'A', 'Å' => 'A', 'Ā' => 'A', 'Ą' => 'A', 'Ă' => 'A', 'Æ' => 'Ae',
        'Ç' => 'C', 'Ć' => 'C', 'Č' => 'C', 'Ĉ' => 'C', 'Ċ' => 'C', 'Ď' => 'D', 'Đ' => 'D',
        'Ð' => 'D', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'Ē' => 'E',
        'Ę' => 'E', 'Ě' => 'E', 'Ĕ' => 'E', 'Ė' => 'E', 'Ĝ' => 'G', 'Ğ' => 'G',
        'Ġ' => 'G', 'Ģ' => 'G', 'Ĥ' => 'H', 'Ħ' => 'H', 'Ì' => 'I', 'Í' => 'I',
        'Î' => 'I', 'Ï' => 'I', 'Ī' => 'I', 'Ĩ' => 'I', 'Ĭ' => 'I', 'Į' => 'I',
        'İ' => 'I', 'Ĳ' => 'IJ', 'Ĵ' => 'J', 'Ķ' => 'K', 'Ł' => 'K', 'Ľ' => 'K',
        'Ĺ' => 'K', 'Ļ' => 'K', 'Ŀ' => 'K', 'Ñ' => 'N', 'Ń' => 'N', 'Ň' => 'N',
        'Ņ' => 'N', 'Ŋ' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O',
        'Ö' => 'Oe', 'Ö' => 'Oe', 'Ø' => 'O', 'Ō' => 'O', 'Ő' => 'O', 'Ŏ' => 'O',
        'Œ' => 'OE', 'Ŕ' => 'R', 'Ř' => 'R', 'Ŗ' => 'R', 'Ś' => 'S', 'Š' => 'S',
        'Ş' => 'S', 'Ŝ' => 'S', 'Ș' => 'S', 'Ť' => 'T', 'Ţ' => 'T', 'Ŧ' => 'T',
        'Ț' => 'T', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'Ue', 'Ū' => 'U',
        'Ü' => 'Ue', 'Ů' => 'U', 'Ű' => 'U', 'Ŭ' => 'U', 'Ũ' => 'U', 'Ų' => 'U',
        'Ŵ' => 'W', 'Ý' => 'Y', 'Ŷ' => 'Y', 'Ÿ' => 'Y', 'Ź' => 'Z', 'Ž' => 'Z',
        'Ż' => 'Z', 'Þ' => 'T', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a',
        'ä' => 'ae', 'ä' => 'ae', 'å' => 'a', 'ā' => 'a', 'ą' => 'a', 'ă' => 'a',
        'æ' => 'ae', 'ç' => 'c', 'ć' => 'c', 'č' => 'c', 'ĉ' => 'c', 'ċ' => 'c',
        'ď' => 'd', 'đ' => 'd', 'ð' => 'd', 'è' => 'e', 'é' => 'e', 'ê' => 'e',
        'ë' => 'e', 'ē' => 'e', 'ę' => 'e', 'ě' => 'e', 'ĕ' => 'e', 'ė' => 'e',
        'ƒ' => 'f', 'ĝ' => 'g', 'ğ' => 'g', 'ġ' => 'g', 'ģ' => 'g', 'ĥ' => 'h',
        'ħ' => 'h', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ī' => 'i',
        'ĩ' => 'i', 'ĭ' => 'i', 'į' => 'i', 'ı' => 'i', 'ĳ' => 'ij', 'ĵ' => 'j',
        'ķ' => 'k', 'ĸ' => 'k', 'ł' => 'l', 'ľ' => 'l', 'ĺ' => 'l', 'ļ' => 'l',
        'ŀ' => 'l', 'ñ' => 'n', 'ń' => 'n', 'ň' => 'n', 'ņ' => 'n', 'ŉ' => 'n',
        'ŋ' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'oe',
        'ö' => 'oe', 'ø' => 'o', 'ō' => 'o', 'ő' => 'o', 'ŏ' => 'o', 'œ' => 'oe',
        'ŕ' => 'r', 'ř' => 'r', 'ŗ' => 'r', 'š' => 's', 'ù' => 'u', 'ú' => 'u',
        'û' => 'u', 'ü' => 'ue', 'ū' => 'u', 'ü' => 'ue', 'ů' => 'u', 'ű' => 'u',
        'ŭ' => 'u', 'ũ' => 'u', 'ų' => 'u', 'ŵ' => 'w', 'ý' => 'y', 'ÿ' => 'y',
        'ŷ' => 'y', 'ž' => 'z', 'ż' => 'z', 'ź' => 'z', 'þ' => 't', 'ß' => 'ss',
        'ſ' => 'ss', 'ый' => 'iy', 'А' => 'A', 'Б' => 'B', 'В' => 'V', 'Г' => 'G',
        'Д' => 'D', 'Е' => 'E', 'Ё' => 'YO', 'Ж' => 'ZH', 'З' => 'Z', 'И' => 'I',
        'Й' => 'Y', 'К' => 'K', 'Л' => 'L', 'М' => 'M', 'Н' => 'N', 'О' => 'O',
        'П' => 'P', 'Р' => 'R', 'С' => 'S', 'Т' => 'T', 'У' => 'U', 'Ф' => 'F',
        'Х' => 'H', 'Ц' => 'C', 'Ч' => 'CH', 'Ш' => 'SH', 'Щ' => 'SCH', 'Ъ' => '',
        'Ы' => 'Y', 'Ь' => '', 'Э' => 'E', 'Ю' => 'YU', 'Я' => 'YA', 'а' => 'a',
        'б' => 'b', 'в' => 'v', 'г' => 'g', 'д' => 'd', 'е' => 'e', 'ё' => 'yo',
        'ж' => 'zh', 'з' => 'z', 'и' => 'i', 'й' => 'y', 'к' => 'k', 'л' => 'l',
        'м' => 'm', 'н' => 'n', 'о' => 'o', 'п' => 'p', 'р' => 'r', 'с' => 's',
        'т' => 't', 'у' => 'u', 'ф' => 'f', 'х' => 'h', 'ц' => 'c', 'ч' => 'ch',
        'ш' => 'sh', 'щ' => 'sch', 'ъ' => '', 'ы' => 'y', 'ь' => '', 'э' => 'e',
        'ю' => 'yu', 'я' => 'ya'
    ];

    // make a human readable string
    $text = strtr($text, $replace);

    // replace non letter or digits by -
    $text = preg_replace('~[^\pL\d.]+~u', '-', $text);

    // trim
    $text = trim($text, '-');

    // remove unwanted characters
    $text = preg_replace('~[^-\w.]+~', '', $text);

    return strtolower($text);
}

// Fourth test
// https://stackoverflow.com/a/2955521/10232729
function slugagain($string){
    
    $table = [
        'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c',
        'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
        'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
        'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
        'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
        'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
        'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
        'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r', ' '=>'-'
    ];

    return strtr($string, $table);
}

// Fifth test
// https://stackoverflow.com/a/27396804/10232729
function slugifybis($url){
    $url = trim($url);

    $url = str_replace(' ', '-', $url);
    $url = str_replace('/', '-slash-', $url);
    
    return rawurlencode($url);
}

// Sixth and last test
// https://stackoverflow.com/a/39442034/10232729
setlocale( LC_ALL, "en_US.UTF8" );  
function slugifyagain($string){
    
    $string = iconv('utf-8', 'us-ascii//translit//ignore', $string); // transliterate
    $string = str_replace("'", '', $string);
    $string = preg_replace('~[^\pL\d]+~u', '-', $string); // replace non letter or non digits by "-"
    $string = preg_replace('~[^-\w]+~', '', $string); // remove unwanted characters
    $string = preg_replace('~-+~', '-', $string); // remove duplicate "-"
    $string = trim($string, '-'); // trim "-"
    $string = trim($string); // trim
    $string = mb_strtolower($string, 'utf-8'); // lowercase
    
    return urlencode($string); // safe;
};

$string = $newString = "¿ Àñdréß l'affreux ğarçon & nøël en forêt !";

$max = 10000;

echo '<pre>';
echo 'Beginning :';
echo '<br />';
echo '<br />';    
echo '> Slugging '.$max.' iterations of following :';
echo '<br />';
echo '>> ' . $string;
echo '<br />';  
echo '<br />';
echo 'Output results :';
echo '<br />';
echo '<br />';  

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){
    
    $newString = slugify($string);
}

$time = (microtime(true) - $start) * 1000;

echo '> First test passed in **' . round($time, 2) . 'ms**';
echo '<br />';  
echo '>> Result : ' . $newString;
echo '<br />';
echo '<br />';

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){
    
    $newString = slug($string);
}

$time = (microtime(true) - $start) * 1000;

echo '> Second test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '<br />';
echo '<br />';

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){
    
    $newString = slugbis($string);
}

$time = (microtime(true) - $start) * 1000;

echo '> Third test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '<br />';
echo '<br />';

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){
    
    $newString = slugagain($string);
}

$time = (microtime(true) - $start) * 1000;

echo '> Fourth test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '<br />';
echo '<br />';

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){
    
    $newString = slugifybis($string);
}

$time = (microtime(true) - $start) * 1000;

echo '> Fifth test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '<br />';
echo '<br />';

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){
    
    $newString = slugifyagain($string);
}

$time = (microtime(true) - $start) * 1000;

echo '> Sixth test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '</pre>';

Beginning :

Slugging 10000 iterations of following :

¿ Àñdréß l'affreux ğarçon & nøël en forêt !

Output results :

First test passed in 120.78ms

Result : -iquest-andresz-laffreux-arcon-and-noel-en-foret-

Second test passed in 3883.82ms

Result : -andreß-laffreux-garcon--nøel-en-foret-

Third test passed in 56.83ms

Result : andress-l-affreux-garcon-noel-en-foret

Fourth test passed in 18.93ms

Result : ¿-AndreSs-l'affreux-ğarcon-&-noel-en-foret-!

Fifth test passed in 6.45ms

Result : %C2%BF-%C3%80%C3%B1dr%C3%A9%C3%9F-l%27affreux-%C4%9Far%C3%A7on-%26-n%C3%B8%C3%ABl-en-for%C3%AAt-%21

Sixth test passed in 112.42ms

Result : andress-laffreux-garcon-n-el-en-foret

Further tests needed.

Edit : less iterations test

Beginning :

Slugging 100 iterations of following :

¿ Àñdréß l'affreux ğarçon & nøël en forêt !

Output results :

First test passed in 1.72ms

Result : -iquest-andresz-laffreux-arcon-and-noel-en-foret-

Second test passed in 48.59ms

Result : -andreß-laffreux-garcon--nøel-en-foret-

Third test passed in 0.91ms

Result : andress-l-affreux-garcon-noel-en-foret

Fourth test passed in 0.3ms

Result : ¿-AndreSs-l'affreux-ğarcon-&-noel-en-foret-!

Fifth test passed in 0.14ms

Result : %C2%BF-%C3%80%C3%B1dr%C3%A9%C3%9F-l%27affreux-%C4%9Far%C3%A7on-%26-n%C3%B8%C3%ABl-en-for%C3%AAt-%21

Sixth test passed in 1.4ms

Result : andress-laffreux-garcon-n-el-en-foret

Mladen Janjetovic · Accepted Answer · 2015-01-13 15:07:24Z

5

I am using:

function slugify($text)
{ 
    $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
    return strtolower(preg_replace('/[^A-Za-z0-9-]+/', '-', $text));
}

Only fallback is that Cyrillic characters will not be converted, and I am searching now for solution that is not long str_replace for every single Cyrillic character.

answered Jan 13, 2015 at 15:07

Mladen Janjetovic

14.8k8 gold badges75 silver badges86 bronze badges

1 Comment

Daniel W. Over a year ago

iconv is not installed on many servers. solution to convert any characters is to use set_locale('cyrillic.UTF-8') first. Exact value depends on your installed locales.

Paulo Victor · Accepted Answer · 2016-12-14 16:57:58Z

4

The most elegant way I think is using a Behat\Transliterator\Transliterator.

I need to extends this class by your class because it is an Abstract, some like this:

<?php
use Behat\Transliterator\Transliterator;

class Urlizer extends Transliterator
{
}

And then, just use it:

$text = "Master Ápiu";
$urlizer = new Urlizer();
$slug = $urlizer->transliterate($slug, "-");
echo $slug; // master-apiu

Of course you should put this things in your composer as well.

composer require behat/transliterator

More info here https://github.com/Behat/Transliterator

edited Dec 14, 2016 at 16:57

answered Dec 14, 2016 at 16:42

Paulo Victor

4,1422 gold badges30 silver badges29 bronze badges

Comments

m1sylla · Accepted Answer · 2019-09-16 22:59:34Z

This may be a way to do it too. Inspired from these links Experts-exchange and alinalexander

function slugifier($txt){

   /* Get rid of accented characters */
   $search = explode(",","ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,e,i,ø,u");
   $replace = explode(",","c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,e,i,o,u");
   $txt = str_replace($search, $replace, $txt);

   /* Lowercase all the characters */
   $txt = strtolower($txt);

   /* Avoid whitespace at the beginning and the ending */
   $txt = trim($txt);

   /* Replace all the characters that are not in a-z or 0-9 by a hyphen */
   $txt = preg_replace("/[^a-z0-9]/", "-", $txt);
   /* Remove hyphen anywhere it's more than one */
   $txt = preg_replace("/[\-]+/", '-', $txt);
   return $txt;   
}

James Risner · Accepted Answer · 2022-10-27 12:21:23Z

3

function slugify($text)
{
    // replace non letter or digits by -
    $text = preg_replace('~[^\pL\d]+~u', '-', $text);
    // transliterate
    $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
    // remove unwanted characters
    $text = preg_replace('~[^-\w]+~', '', $text);
    // trim
    $text = trim($text, '-');
    // remove duplicate -
    $text = preg_replace('~-+~', '-', $text);
    // lowercase
    $text = strtolower($text);
    if (empty($text)) {
        return 'n-a';
    }
    return $text;
}

Use case:

echo slugify('bu metinde ç ö ş ğ ü ı * # karakter $ @ ! ?  kullanılamaz');

Output: bu-metinde-c-o-s-g-u-i-karakter-kullanilamaz

edited Oct 27, 2022 at 12:21

James Risner

6,09711 gold badges33 silver badges60 bronze badges

answered Oct 21, 2022 at 10:32

Erhan URGUN

1035 bronze badges

Comments

Community · Accepted Answer · 2023-11-17 19:42:58Z

2

You could have a look at Normalizer::normalize(), see here. It just needs to load the intl module for PHP

edited Nov 17, 2023 at 19:42

CommunityBot

11 silver badge

answered Jun 2, 2010 at 12:46

Serty Oan

1,72612 silver badges19 bronze badges

Comments

Bery · Accepted Answer · 2015-06-25 14:30:53Z

2

What about using something that is already implemented in Core?

//Clean non UTF-8 characters    
Mage::getHelper('core/string')->cleanString($str)

Or one of the core url/ url rewrite methods..

answered Jun 25, 2015 at 14:30

Bery

1,1041 gold badge10 silver badges23 bronze badges

Comments

daygloink · Accepted Answer · 2015-10-07 19:49:05Z

2

I wrote this based on Maerlyn's response. This function will work regardless of the character encoding on the page. It also won't turn single quotes in to dashes :)

function slugify ($string) {
    $string = utf8_encode($string);
    $string = iconv('UTF-8', 'ASCII//TRANSLIT', $string);   
    $string = preg_replace('/[^a-z0-9- ]/i', '', $string);
    $string = str_replace(' ', '-', $string);
    $string = trim($string, '-');
    $string = strtolower($string);

    if (empty($string)) {
        return 'n-a';
    }

    return $string;
}

answered Oct 7, 2015 at 19:49

daygloink

4376 silver badges10 bronze badges

1 Comment

GibsonFX Over a year ago

Yay this one actually works, that was really bugging me turning apostrophes/single quotes into hyphens. Thank you.

Massimito · Accepted Answer · 2024-07-28 14:29:18Z

In my opinion, the option to use Transliterator::transliterate is the one that gives the best results (converting from any alphabet), it's very customizable, results in a short code and doesn't need external libs.

But using the rules from hdogan's valuable answer, it returns incorrect results in some cases:

Removes hyphens between numbers: Years 2020-2024 -> years-20202024, expected result: years-2020-2024.
Removes characters (like underscore) instead of converting them to hyphens: The_word -> theword, expected result: the-word.
In some cases it leaves hyphens at the beginning or end: my string -> -my-string-, expected result: my-string.
Doesn't convert or remove all non-alphanumeric characters: Test © -> test-©, expected result: test-c or test (in my case, I prefer the first one).
It doesn't normalize or remove other letter variants: Nº 1 -> nº-1, expected result: no-1 or n-1.

Based on that response, I have made some changes to the rules:

$string = ' Namnet på bildtävlingen nº1@2020-`24...like&share';

$slug = \Transliterator::createFromRules("
    :: Any-Latin;
    :: NFD;
    :: [:Mn:] Remove;
    :: NFKC;
    :: Latin-ASCII;
    :: Lower();
    ^[^[:Ll:][:Nd:]]+ > ;
    [^[:Ll:][:Nd:]]+$ > ;
    [^[:Ll:][:Nd:]]+ > '-';
")->transliterate($string);

$slug = preg_replace(['/[^a-z0-9-]+/', '/^-+|(?<=-)-+|-+$/'], '', $slug);

echo $slug; // namnet-pa-bildtavlingen-no1-2020-24-like-share
// previous result: -namnet-pa-bildtavlingen-nº12020`24likeshare

Explanation of the changes:

Changed NFC to NFKC, since it normalizes more characters (ª -> a, etc.).
Added Latin-ASCII to convert more characters to alphanumeric (® -> r, etc.).
All non-alphanumeric characters from the beginning and end are removed.
All non-alphanumeric characters (not only [-[:Separator:]]) are converted to hyphens.

The use of preg_replace is because some Cyrillic characters (e.g. ҳ) are not normalized for some reason, and we want only alphanumeric result with hyphens, so we remove these few exceptions. I have not managed to normalize them in the rules, if anyone knows if it's possible, the collaboration is appreciated.

Comparison demo

ICU Documentation / Unicode categories

Lucas Bustamante · Accepted Answer · 2018-08-17 14:26:48Z

1

There's a good solution here that deals with special characters as well.

Texto Fantástico => texto-fantastico

function slugify( $string, $separator = '-' ) {
    $accents_regex = '~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i';
    $special_cases = array( '&' => 'and', "'" => '');
    $string = mb_strtolower( trim( $string ), 'UTF-8' );
    $string = str_replace( array_keys($special_cases), array_values( $special_cases), $string );
    $string = preg_replace( $accents_regex, '$1', htmlentities( $string, ENT_QUOTES, 'UTF-8' ) );
    $string = preg_replace("/[^a-z0-9]/u", "$separator", $string);
    $string = preg_replace("/[$separator]+/u", "$separator", $string);
    return $string;
}

Author: Natxet

edited Aug 17, 2018 at 14:26

answered Mar 11, 2017 at 21:29

Lucas Bustamante

17.6k9 gold badges101 silver badges94 bronze badges

Comments

Starboy · Accepted Answer · 2020-10-31 20:43:13Z

1

On my localhost everything was ok, but on server it helped me “set_locale” and “utf-8” at “mb_strtolower”.

<?
setlocale( LC_ALL, "en_US.UTF8" );
function slug( $str, $char = "-", $tf = "lowercase" )
{
    $str = iconv( "utf-8", "us-ascii//translit//ignore", $str ); // transliterate
    $str = str_replace( "'", "", $str ); // remove “'” generated by iconv
    $str = preg_replace( "~[^a-z0-9]+~ui", $char, $str ); // replace unwanted by single “-”
    $str = trim( $str, $char ); // trim “-”
    if( $tf == "lowercase" ) $str = mb_strtolower( $str, "utf-8" ); // lowercase
    elseif( $tf == "uppercase" ) $str = mb_strtoupper( $str, "utf-8" );
    return $str;
}
?>

Test

$string = "--+ě𐐷𤭢ščřžýá091354--––—_-6íé↨☻☻ßẞąć …   ęłńśźżĄĆ  ĘŁŃŚ   Ź///+++||||..Ż";
echo slug( $string );
echo slug( $string, "☻", "uppercase" );

// → escrzya091354-6iessac-elnszzac-elns-z-z
// → ESCRZYA091354☻6IESSAC☻ELNSZZAC☻ELNS☻Z☻Z

edited Oct 31, 2020 at 20:43

answered Sep 12, 2016 at 0:44

Starboy

1175 bronze badges

1 Comment

Starboy Over a year ago

its for latin extended ěščřžýáíéúůó

ahinkle · Accepted Answer · 2022-02-10 18:52:13Z

1

For standard alphanumeric english - nothing complicated.

/**
 * Update the provided string to a slug-safe format.
 *
 * @param string $string
 * @return string
 */
function slugify($string)
{
    return strtolower(trim(preg_replace('/[^A-Za-z0-9-]+/', '-', $string), '-'));
}

answered Feb 10, 2022 at 18:52

ahinkle

2,2814 gold badges34 silver badges59 bronze badges

Comments

tchartron · Accepted Answer · 2022-08-16 07:59:53Z

Not sure it works for every cases but i took the slug method from Laravel Str class and added the iconv('utf-8', 'us-ascii//TRANSLIT', $title) thing to treat accents without the need to use voku/portable-ascii it seems to work pretty well for my use cases :

    public static function slug($title, $separator = '-')
    {
        $title = iconv('utf-8', 'us-ascii//TRANSLIT', $title);
        $flip = $separator === '-' ? '_' : '-';
        $title = preg_replace('!['.preg_quote($flip).']+!u', $separator, $title);
        // Replace @ with the word 'at'
        $title = str_replace('@', $separator.'at'.$separator, $title);
        // Remove all characters that are not the separator, letters, numbers, or whitespace.
        $title = preg_replace('![^'.preg_quote($separator).'\pL\pN\s]+!u', '', mb_strtolower($title, 'UTF-8'));
        // Replace all separator characters and whitespace by a single separator
        $title = preg_replace('!['.preg_quote($separator).'\s]+!u', $separator, $title);

        return trim($title, $separator);
    }

Andres SK · Accepted Answer · 2023-01-04 21:52:48Z

1

In Laravel 8 it's:

Str::slug('Hola como te va', '-');

// hola-como-te-va

edited Jan 4, 2023 at 21:52

Andres SK

11.1k27 gold badges98 silver badges159 bronze badges

answered Jan 2, 2023 at 22:21

Caro Pérez

5384 silver badges6 bronze badges

Comments

MTJ · Accepted Answer · 2014-12-10 08:57:35Z

Since gTLDs and IDNs are becoming more and more used I cannot see why URL shouldn't contain Andrés.

Just rawurlencode $URL you want instead. Most browsers show UTF-8 characters in URLs (not some ancient IE6 maybe) and bit.ly / goo.gl can be used to make it short in cases like Russian and Arabic if need may be for ad purposes or just write them in ads like user would write them on browser URL.

Only difference is spaces " " it might be good idea to replace them with "-" and "/" if you don't want to allow those.

<?php
function slugify($url)
{
    $url = trim($url);

    $url = str_replace(" ","-",$url);
    $url = str_replace("/","-slash-",$url);
    $url = rawurlencode($url);
}
?>

Url as encoded http://www.hurtta.com/RU/%D0%9F%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D1%8B/

Url as written http://www.hurtta.com/RU/Продукты/

Muhammad Bilal · Accepted Answer · 2019-07-04 08:06:16Z

0

Since I've Seen a lot of methods here but I've found a simplest method for myself.Maybe it will help someone.

$slug = strtolower(preg_replace('/[^a-zA-Z0-9\-]/', '',preg_replace('/\s+/', '-', $string) ));

answered Jul 4, 2019 at 8:06

Muhammad Bilal

5271 gold badge5 silver badges13 bronze badges

Comments

Vivek · Accepted Answer · 2021-09-18 09:51:47Z

0

Here is the short and easy solution for making slug

function convertURLs($value){
$delimiter = '-';
$slug = strtolower(trim(preg_replace('/[\s-]+/', $delimiter, preg_replace('/[^A-Za-z0-9-]+/', $delimiter, preg_replace('/[&]/', 'and', preg_replace('/[\']/', '', iconv('UTF-8', 'ASCII//TRANSLIT', $value))))), $delimiter));
return $slug;}

answered Sep 18, 2021 at 9:51

Vivek

1,50318 silver badges27 bronze badges

Comments

Matteo Feduzi · Accepted Answer · 2023-01-24 15:57:55Z

0

I'm using this function and it works fine:

function slugify($string) {
  return strtolower(trim(preg_replace('~[^0-9a-z]+~i', '-', html_entity_decode(preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', htmlentities($string, ENT_QUOTES, 'UTF-8')), ENT_QUOTES, 'UTF-8')), '-'));
}

answered Jan 24, 2023 at 15:57

Matteo Feduzi

551 silver badge10 bronze badges

Comments

Wongjn · Accepted Answer · 2023-05-07 16:29:40Z

0

Try to using urldecode to get the decoded string:

echo urldecode('%D8%A7%D9%84%D8%B3%D9%84%D8%A7%D9%85-%D8%B9%D9%84%DB%8C%DA%A9'); 

// return السلام-علیک

edited May 7, 2023 at 16:29

Wongjn

27.1k5 gold badges24 silver badges50 bronze badges

answered May 2, 2023 at 20:02

farid teymouri

3093 silver badges5 bronze badges

Comments

SomoKRoceS · Accepted Answer · 2020-07-10 17:12:12Z

-1

For me this variant is perfect, also it change & to and. Here is code:

function dSlug($string) {
    return strtolower(trim(preg_replace('~[^0-9a-z]+~i', '-', html_entity_decode(preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1',htmlentities(preg_replace('/[&]/', ' and ', $title), ENT_QUOTES, 'UTF-8')), ENT_QUOTES, 'UTF-8')), '-'));
}`

edited Jul 10, 2020 at 17:12

SomoKRoceS

3,0733 gold badges24 silver badges33 bronze badges

answered Jul 10, 2020 at 15:21

Артем Рудницкий

11 bronze badge

Collectives™ on Stack Overflow

32 Answers 32

22 Comments

Update

Answer

9 Comments

6 Comments

3 Comments

2 Comments

1 Comment

2 Comments

Code

Output

Note

Comments

2 Comments

3 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related