1

I have a regex pattern I found for removing phone numbers from strings of text. It works great, except a couple of cases (these are US phone numbers).

Here is the regex:

/\(?\d{3}\)?[-\s.]?\d{3}[-\s.]\d{4}/x

Here are the cases I need to catch:

  • 55555555555 (area code + 7 numbers)
  • 155555555555 (1 + area code + 7 numbers)
  • (555)-5555555 (area code in parenthesis, dash, 7 numbers)
  • 1-555-555-5555
  • 1-(555)-555-5555

Here is the regex replace I am using:

$pattern = "/\(?\d{3}\)?[-\s.]?\d{3}[-\s.]\d{4}/x";
$replacement = "[phone redacted]";
$body = preg_replace($pattern, $replacement, $body);
3
  • Your examples certainly don't cover every way people could try to get around this redaction scheme, so I hope it's truly representative of your data :) Commented Feb 28, 2012 at 20:20
  • Yes, its hard because people entering phone numbers in a text area is unpredictable. It's difficult to say, well, is 55555555 a price, or a phone number? Commented Feb 28, 2012 at 20:48
  • erg edit: but in this case, I know exactly the text being edited, so the matches here should almost always be a phone number. Commented Feb 28, 2012 at 21:08

2 Answers 2

2

How about:

/(?:1-?)?(?:\(\d{3}\)|\d{3})[-\s.]?\d{3}[-\s.]?\d{4}/

test:

$arr = array(
'5555555555 (area code + 7 numbers)',
'15555555555 (1 + area code + 7 numbers)',
'(555)-5555555 (area code in parenthesis, dash, 7 numbers)',
'1-555-555-5555',
'1-(555)-555-5555');

$pattern = "/(?:1-?)?(?:\(\d{3}\)|\d{3})[-\s.]?\d{3}[-\s.]?\d{4}/x";
$replacement = "[phone redacted]";
foreach($arr as $body) {
  echo preg_replace($pattern, $replacement, $body), "\n";
}

output:

[phone redacted] (area code + 7 numbers)
[phone redacted] (1 + area code + 7 numbers)
[phone redacted] (area code in parenthesis, dash, 7 numbers)
[phone redacted]
[phone redacted]
Sign up to request clarification or add additional context in comments.

3 Comments

+1 looks pretty good - catches more cases than asked for, but HOPEFULLY that won't be a problem :)
Gave this a +1 but would like to add the use of periods, such as: 1.555.555.5555 which is growing in popularity.
@Abela: It works also with period and space see [-\s.] except for the first one, just replace (?:1-?)? with (?:1[-.]?)?
0

This one should match:

(1-?)?(-?\([0-9]{3}\)|[0-9]{3})(-?[0-9]{3}){2}[0-9]

2 Comments

This actually replaced everything (no text at all).
When I have a paragraph that has 1-555-555-5555 etc in it, it finds it, but also removes all text. Updating original question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.