4

I want to replace the given phone numbers in an html string, such as

<a>click here now! (123) -456-789</a>

I think that the best way to approach it would be to find all the different circumstances where there looks like a phone number, such as:

$pattern = *any 3 numbers* *any characters up to 3 characters long* 
$pattern .= *any 3 numbers* *any characters up to 3 characters long* 
$pattern .= *any numbers up to 4 numbers long*

// $pattern maybe something like [0-9]{3}\.?([0-9]{3})\.?([0-9]{4})

$array = preg_match_all($pattern, $string);

foreach($array)
{
    // replace the string with the the new phone number
}

Basically, how would the regex be?

13
  • How general do you need it? Do you know for sure the phone numbers will be formatted as (123) 456-7890, or does it have to handle any sort of spacing (or not), or periods/parens/hyphens/etc? Commented Jun 26, 2013 at 22:14
  • No idea how it will go, that's why I'm assuming that there should be up to 3 characters long. I am going to make the assumption that it's not something like (123) - 456 - 7890. Commented Jun 26, 2013 at 22:15
  • 1
    The key word here is regular expression. The data you're looking to match appears to be inherently irregular, so your matching it going to be spotty at best. Commented Jun 26, 2013 at 22:15
  • Please use \d instead of the ugly [0-9] while you can :) Commented Jun 26, 2013 at 22:16
  • I do know that there will be a certain rule, such as (111) 222-333 or 111-222-333. I make the assumption that it's not going to be spelled out (such as one11-222-333). Commented Jun 26, 2013 at 22:16

3 Answers 3

10

Based on the Local conventions for writing telephone numbers entry in Wikipedia, there are a variety of formats globally if you want to strip out ALL phone numbers. In the following examples the place holder 0 represents a number. The following is a sample from the wiki entry (there may be duplicates).

0 (000) 000-0000
0000 0000
00 00 00 00
00 000 000
00000000
00 00 00 00 00
+00 0 00 00 00 00
00000 000000
+00 0000 000000
(00000) 000000
+00 0000 000000
+00 (0000) 000000
00000-000000
00000/000000
000 0000
000-000-000
0 0000 00-00-00
(0 0000) 00-00-00
0 000 000-00-00
0 (000) 000-00-00
000 000 000
000 00 00 00
000 000 000
000 000 00 00
+00 00 000 00 00
0000 000 000
(000) 0000 0000
(00000) 00000
(0000) 000 0000
0000 000 0000
0000-000 0000
0000 000 0000
00000 000000
0000 000000
0000 000 00 00
+00 000 000 00 00
(000) 0000000
+00 00 00000000
000 000 000
+00-00000-00000
(0000) 0000 0000
+00 000 0000 0000
(0000) 0000 0000
+00 (00) 000 0000
+00 (0) 000 0000
+00 (000) 000 0000
(00000) 00-0000
(000) 000-000-0000
(000) [00]0-000-0000
(00000) 0000-0000
+ 000 0000 000000
8.8.8.8
192.168.1.1
0 (000) 000-0000 ext 1
0 (000) 000-0000 x 1001
0 (000) 000-0000 extension 2
0 000 000-0000 code 3

Since while you could try to write some crazy REGEX that would qualify each number based on it's country code, dialing prefix, etc for matching in your purposes this is not needed and would be a waste of time. From a Bayesian approach the longer numbers tend to be 18 characters (Argentina mobile numbers) with possibility of a leading + character followed by numbers [0-9] or \d, parenthesis (), brackets [] and possibly spaces , periods ., or hyphens - and one obscure format with a /.

\b\+?[0-9()\[\]./ -]{7,17}\b

For all of these numbers we'll also append the following extension formats

ext 123456
x 123456
# 123456
EXT 123456
- 123456
code 2
-12
Extension 123456

\b\+?[0-9()\[\]./ -]{7,17}\s+(extension|x|#|-|code|ext)\s+[0-9]{1,6}

So total you would look for phone numbers or phone numbers with extensions:

$pattern = '!(\b\+?[0-9()\[\]./ -]{7,17}\b|\b\+?[0-9()\[\]./ -]{7,17}\s+(extension|x|#|-|code|ext)\s+[0-9]{1,6})!i';

Note: that this will also strip IP addresses. If you want to keep IP addresses you will need to replace the periods in the IP addresses with something that will not match our Phone Number Regex, then switch them back.

So for your code you would use:

$string = preg_replace($pattern,'*Phone*',$string);

Here's a PHP fiddle of the matching test.

Sign up to request clarification or add additional context in comments.

1 Comment

This is a really interesting and comprehensive approach, thanks.
1

I think this will match two sets of three digits and a set of four digits, with "common" phone number punctuation in-between:

\d{3}[().-\s[\]]*\d{3}[().-\s[\]]*\d{4}

This allows for three digits, then any number of punctuation characters or spaces, then three more digits, then more punctuation, then four digits.

However, without a better idea of the formatting of the input, you will never really be sure that you're going to get only phone numbers and not something else, or that you won't skip over any phone numbers.

If you want to replace the number you find with your own number, I might try something like this:

preg_replace('/\d{3}([().-\s[\]]*)\d{3}([().-\s[\]]*)\d{4}/',
    "123$1456$27890", $input);

In the replacement string, $1 and $2 are the two parenthesized blocks of punctuation in-between the numbers. This way you can replace just the numbers you find, and leave the punctuation alone by inserting the same punctuation back into the resulting string.

2 Comments

So, now that we have this match, how would we go about replacing the numbers themselves with my own phone number? And I absolutely appreciate your comment about irregular expression. I understand that phone numbers are a pain to do, but I think this would be the best route to take (maybe you disagree?)
If you have no control over the output you're parsing, then this is probably about as good as you can do (you could make the regex more complicated to make sure it's a valid phone number, for some improvement). Do you need your phone number to be in the same format? If not, how about something like: preg_replace($pattern, "(123) 456-7890", $input);? If the formatting is important, then you'll probably want to look into using capture groups.
0

Here is the function I use that I downloaded from somewhere (don't remember where I got this from).

/*
// PHP function to validate US phone number:
// (c) 2003
// No restrictions have been placed on the use of this code
//
// Updated Friday Jan 9 2004 to optionally ignore the area code:
//
// Input: a single string parameter and an optional boolean variable (default=true)
// Output: 10 digit telephone number or boolean false(0)
//
// The function will return the numerical part of the alphanumeric string
// parameter with the following sequence of characters:
// any number of spaces [optional],
// a single open parentheses [optional],
// any number of spaces [optional],
// 3 digits (area code),
// any number of spaces [optional],
// a single close parentheses [optional],
// a single dash [optional],
// any number of spaces [optional],
// 3 digits, any number of spaces [optional],
// a single dash [optional],
// any number of spaces [optional],
// 4 digits, any number of spaces [optional]:
*/
function validate_USphone($phonenumber, $useareacode=true)
{
   if ( preg_match("/^[ ]*[(]{0,1}[ ]*[0-9]{3,3}[ ]*[)]{0,1}[-]{0,1}[ ]*[0-9]{3,3}[ ]*[-]{0,1}[ ]*[0-9]{4,4}[ ]*$/",$phonenumber) || (preg_match("/^[ ]*[0-9]{3,3}[ ]*[-]{0,1}[ ]*[0-9]{4,4}[ ]*$/",$phonenumber) && !$useareacode)) return preg_replace("/[^0-9]/i", "", $phonenumber);
   return false;
}

3 Comments

@Dagon - True, but it works and the Regex is sound from what I can tell.
@Dagon if you're using Unix, Linux, or Windows 10 year old code is all over the place.
im wary of comments posted 6 hours latter

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.