133

Quick add on requirement in our project. A field in our DB to hold a phone number is set to only allow 10 characters. So, if I get passed "(913)-444-5555" or anything else, is there a quick way to run a string through some kind of special replace function that I can pass it a set of characters to allow?

Regex?

9 Answers 9

263

Definitely regex:

string CleanPhone(string phone)
{
    Regex digitsOnly = new Regex(@"[^\d]");   
    return digitsOnly.Replace(phone, "");
}

or within a class to avoid re-creating the regex all the time:

private static Regex digitsOnly = new Regex(@"[^\d]");   

public static string CleanPhone(string phone)
{
    return digitsOnly.Replace(phone, "");
}

Depending on your real-world inputs, you may want some additional logic there to do things like strip out leading 1's (for long distance) or anything trailing an x or X (for extensions).

Sign up to request clarification or add additional context in comments.

7 Comments

That's perfect. This is only used a couple of times, so we don't need to create a class, and as far as the leading 1, not a bad idea. But I think I'd rather handle that on a case by case basis, at least in this project. Thanks again -- if I could upvote again, I would.
I'm waiting for someone to post an extension method version of this for the string class :)
@Joel I added the extension method version below. Guess the comments don't support markdown.
Note [^\d] can be simplified to \D
Combined this answer (caching the regex in the class) with the extension method one below :)
|
78

You can do it easily with regex:

string subject = "(913)-444-5555";
string result = Regex.Replace(subject, "[^0-9]", ""); // result = "9134445555"

Comments

47

You don't need to use Regex.

phone = new String(phone.Where(c => char.IsDigit(c)).ToArray())

7 Comments

Nice Answer, why add more reference to RegularExpressions namespace
@BTE because it's a short-hand that's simply utilizing system.linq;
How well does this perform compared with the Regex solution?
Adding a test to @Max-PC's benchmark code for the LINQ solution results in -- StringBuilder: 273ms, Regex: 2096ms, LINQ: 658ms. Slower than StringBuilder but still significantly faster than Regex. Given that that is benchmarking 1,000,000 replacements, the effective difference between the StringBuilder and LINQ solutions for most scenarios is probably neglible.
@ChrisPratt for the regex, did you create a new regex each time, or re-use an existing one? That could have a big impact on performance.
|
23

Here's the extension method way of doing it.

public static class Extensions
{
    public static string ToDigitsOnly(this string input)
    {
        Regex digitsOnly = new Regex(@"[^\d]");
        return digitsOnly.Replace(input, "");
    }
}

Comments

10

Using the Regex methods in .NET you should be able to match any non-numeric digit using \D, like so:

phoneNumber  = Regex.Replace(phoneNumber, "\\D", String.Empty);

1 Comment

This isn't quite right. You need a @ or "\\D" to escape the \ in the regex. Also, you should use String.Empty instead of ""
5

How about an extension method that doesn't use regex.

If you do stick to one of the Regex options at least use RegexOptions.Compiled in the static variable.

public static string ToDigitsOnly(this string input)
{
    return new String(input.Where(char.IsDigit).ToArray());
}

This builds on Usman Zafar's answer converted to a method group.

Comments

5

for the best performance and lower memory consumption , try this:

using System;
using System.Diagnostics;
using System.Text;
using System.Text.RegularExpressions;

public class Program
{
    private static Regex digitsOnly = new Regex(@"[^\d]");

    public static void Main()
    {
        Console.WriteLine("Init...");

        string phone = "001-12-34-56-78-90";

        var sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < 1000000; i++)
        {
            DigitsOnly(phone);
        }
        sw.Stop();
        Console.WriteLine("Time: " + sw.ElapsedMilliseconds);

        var sw2 = new Stopwatch();
        sw2.Start();
        for (int i = 0; i < 1000000; i++)
        {
            DigitsOnlyRegex(phone);
        }
        sw2.Stop();
        Console.WriteLine("Time: " + sw2.ElapsedMilliseconds);

        Console.ReadLine();
    }

    public static string DigitsOnly(string phone, string replace = null)
    {
        if (replace == null) replace = "";
        if (phone == null) return null;
        var result = new StringBuilder(phone.Length);
        foreach (char c in phone)
            if (c >= '0' && c <= '9')
                result.Append(c);
            else
            {
                result.Append(replace);
            }
        return result.ToString();
    }

    public static string DigitsOnlyRegex(string phone)
    {
        return digitsOnly.Replace(phone, "");
    }
}

The result in my computer is:
Init...
Time: 307
Time: 2178

1 Comment

+1 for showing benchmarks. Interesting that the loop with StringBuilder outperforms RegEx, although I guess it makes sense when RegEx probably has to wade through a lot of rules to decide what to do.
3

I'm sure there's a more efficient way to do it, but I would probably do this:

string getTenDigitNumber(string input)
{    
    StringBuilder sb = new StringBuilder();
    for(int i - 0; i < input.Length; i++)
    {
        int junk;
        if(int.TryParse(input[i], ref junk))
            sb.Append(input[i]);
    }
    return sb.ToString();
}

2 Comments

That was my first instinct, and was also why I asked here. RegEx seems like a much better solution to me. But thanks for the answer!
TryParse? just to determine if a character is a digit? Use IsDigit. Also calling input[i] twice could be optimized.
-1

try this

public static string cleanPhone(string inVal)
        {
            char[] newPhon = new char[inVal.Length];
            int i = 0;
            foreach (char c in inVal)
                if (c.CompareTo('0') > 0 && c.CompareTo('9') < 0)
                    newPhon[i++] = c;
            return newPhon.ToString();
        }

1 Comment

return newPhone.ToString(); will return "System.Char[]". I think you meant return new string(newPhone);, But this also is filtering out the numbers 0 and 9 because of the > and < instead of >= and <=. But even then then string will have trailing spaces because the newPhon array is longer than it needs to be.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.