797

How do I remove all non alphanumeric characters from a string except dash and space characters?

13 Answers 13

1110

Replace [^a-zA-Z0-9 -] with an empty string.

Regex rgx = new Regex("[^a-zA-Z0-9 -]");
str = rgx.Replace(str, "");
Sign up to request clarification or add additional context in comments.

18 Comments

Worth mentioning that - must be at the end of the character class, or escaped with a backslash, to prevent being used for a range.
@Dan set the global flag in your regex - without that, it just replaces the first match. A quick google should tell you how to set global flag in classic ASP regex. Otherwise, look for a replaceAll function instead of replace.
Here's a regex compiled version: return Regex.Replace(str, "[^a-zA-Z0-9_.]+", "", RegexOptions.Compiled); Same basic question
@MGOwen because every time you use "" you are creating a new object due to strings being immutable. When you use string.empty you are reusing the single instance required for representing an empty string which is quicker as well as being more efficient.
@BrianScott I know this is old, but was found in a search so I feel this is relevant. This actually depends on the version of .NET you are running under. > 2.0 uses "" & string.Empty exactly the same. stackoverflow.com/questions/151472/…
|
406

I could have used RegEx, they can provide elegant solution but they can cause performane issues. Here is one solution

char[] arr = str.ToCharArray();

arr = Array.FindAll<char>(arr, (c => (char.IsLetterOrDigit(c) 
                                  || char.IsWhiteSpace(c) 
                                  || c == '-')));
str = new string(arr);

When using the compact framework (which doesn't have FindAll)

Replace FindAll with1

char[] arr = str.Where(c => (char.IsLetterOrDigit(c) || 
                             char.IsWhiteSpace(c) || 
                             c == '-')).ToArray(); 

str = new string(arr);

1 Comment by ShawnFeatherly

5 Comments

in my testing, this technique was much faster. to be precise, it was just under 3 times faster than the Regex Replace technique.
The compact framework doesn't have FindAll, you can replace FindAll with char[] arr = str.Where(c => (char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || c == '-')).ToArray();
has anyone tested this? That didn't work at all. --but this did for me: string str2 = new string(str.Where(c => (char.IsLetterOrDigit(c))).ToArray());
As a single line str = string.Concat(str.Where(c => Char.IsLetterOrDigit(c) || Char.IsWhiteSpace(c)))
You present .Where as being a bit of a last resort if Array.FindAll isn't available, but it seems quite a bit simpler to me. Is there any reason you prefer FindAll?
83

You can try:

string s1 = Regex.Replace(s, "[^A-Za-z0-9 -]", "");

Where s is your string.

3 Comments

This does not work as it gives a "symbol not found" error, even after importing java.util.regex.*
@DavidBandel it's C#
@m47730 It works for me in JS, PHP and Python, here is a demo.
52

Using System.Linq

string withOutSpecialCharacters = new string(stringWithSpecialCharacters.Where(c =>char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || c == '-').ToArray());

3 Comments

@Michael It is similar but at least this is a one liner, rather than 3 lines. I'd say that's enough to make it a different answer.
@Dymas I now agree that it is acceptable, but not because the whitespace is different. Apparently the part that is functionally equivalent (only var names differ) was edited in after this answer was written.
@ZainAli, if you make a trivial edit and ping me, I'll reverse my downvote. I apologize for any insinuation of plagiary.
32

The regex is [^\w\s\-]*:

\s is better to use instead of space (), because there might be a tab in the text.

10 Comments

unless you want to remove tabs.
...and newlines, and all other characters considered "whitespace".
This solution is far superior to the above solutions since it also supports international (non-English) characters. <!-- language: c# --> string s = "Mötley Crue 日本人: の氏名 and Kanji 愛 and Hiragana あい"; string r = Regex.Replace(s,"[^\\w\\s-]*",""); The above produces r with: Mötley Crue 日本人 の氏名 and Kanji 愛 and Hiragana あい
Use @ to escape \ conversion in string: @"[^\w\s-]*"
it, uhhh... doesn't remove underscores? that is considered a "word" character by regex implementation across creation, but it's not alphanumeric, dash, or space... (?)
|
25

Based on the answer for this question, I created a static class and added these. Thought it might be useful for some people.

public static class RegexConvert
{
    public static string ToAlphaNumericOnly(this string input)
    {
        Regex rgx = new Regex("[^a-zA-Z0-9]");
        return rgx.Replace(input, "");
    }

    public static string ToAlphaOnly(this string input)
    {
        Regex rgx = new Regex("[^a-zA-Z]");
        return rgx.Replace(input, "");
    }

    public static string ToNumericOnly(this string input)
    {
        Regex rgx = new Regex("[^0-9]");
        return rgx.Replace(input, "");
    }
}

Then the methods can be used as:

string example = "asdf1234!@#$";
string alphanumeric = example.ToAlphaNumericOnly();
string alpha = example.ToAlphaOnly();
string numeric = example.ToNumericOnly();

2 Comments

For the example that you provide it would also be useful if you provide the outcomes of each of the methods.
This solution is culture dependent.
21

Want something quick?

public static class StringExtensions 
{
    public static string ToAlphaNumeric(this string self,
                                        params char[] allowedCharacters)
    {
        return new string(Array.FindAll(self.ToCharArray(),
                                        c => char.IsLetterOrDigit(c) ||
                                        allowedCharacters.Contains(c)));
    }
}

This will allow you to specify which characters you want to allow as well.

2 Comments

IMHO - the best solution here.
Looks clean, but a bit hard to specify how to add white space ? I would have added another overload which allows whitespace too as this method works fine on words, but not sentences or other whitespace such as newlines or tabs. +1 anyways, good solution. public static string ToAlphaNumericWithWhitespace(this string self, params char[] allowedCharacters) { return new string(Array.FindAll(self.ToCharArray(), c => char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || allowedCharacters.Contains(c))); }
6

Here is a non-regex heap allocation friendly fast solution which was what I was looking for.

Unsafe edition.

public static unsafe void ToAlphaNumeric(ref string input)
{
    fixed (char* p = input)
    {
        int offset = 0;
        for (int i = 0; i < input.Length; i++)
        {
            if (char.IsLetterOrDigit(p[i]))
            {
                p[offset] = input[i];
                offset++;
            }
        }
        ((int*)p)[-1] = offset; // Changes the length of the string
        p[offset] = '\0';
    }
}

And for those who don't want to use unsafe or don't trust the string length hack.

public static string ToAlphaNumeric(string input)
{
    int j = 0;
    char[] newCharArr = new char[input.Length];

    for (int i = 0; i < input.Length; i++)
    {
        if (char.IsLetterOrDigit(input[i]))
        {
            newCharArr[j] = input[i];
            j++;
        }
    }

    Array.Resize(ref newCharArr, j);

    return new string(newCharArr);
}

1 Comment

You shouldn't alter the contents of the string because of string pooling.
4

I´ve made a different solution, by eliminating the Control characters, which was my original problem.

It is better than putting in a list all the "special but good" chars

char[] arr = str.Where(c => !char.IsControl(c)).ToArray();    
str = new string(arr);

it´s simpler, so I think it´s better !

Comments

3

Here's an extension method using @ata answer as inspiration.

"hello-world123, 456".MakeAlphaNumeric(new char[]{'-'});// yields "hello-world123456"

or if you require additional characters other than hyphen...

"hello-world123, 456!?".MakeAlphaNumeric(new char[]{'-','!'});// yields "hello-world123456!"


public static class StringExtensions
{   
    public static string MakeAlphaNumeric(this string input, params char[] exceptions)
    {
        var charArray = input.ToCharArray();
        var alphaNumeric = Array.FindAll<char>(charArray, (c => char.IsLetterOrDigit(c)|| exceptions?.Contains(c) == true));
        return new string(alphaNumeric);
    }
}

Comments

2

If you are working in JS, here is a very terse version

myString = myString.replace(/[^A-Za-z0-9 -]/g, "");

2 Comments

I believe OP might have asked about C#, not JS.
Not related to C# language...
0

I use a variation of one of the answers here. I want to replace spaces with "-" so its SEO friendly and also make lower case. Also not reference system.web from my services layer.

private string MakeUrlString(string input)
{
    var array = input.ToCharArray();

    array = Array.FindAll<char>(array, c => char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || c == '-');

    var newString = new string(array).Replace(" ", "-").ToLower();
    return newString;
}

Comments

-1

There is a much easier way with Regex.

private string FixString(string str)
{
    return string.IsNullOrEmpty(str) ? str : Regex.Replace(str, "[\\D]", "");
}

1 Comment

only replaces non numeric characters

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.